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FUNDAMENTAL LIMIT THEOREMS OF PROBABILITY THEORY! 
By M. Lo&Eve? 
University of California, Berkeley 


no sooner is Proteus caught 
than he changes his shape 


1. Introduction. The fundamental limit theorems of Probability theory may 
be classified into two groups. One group deals with the problem of limit laws 
of sequences of sums of random variables, the other deals with the problem of 
limits of random variables, in the sense of almost sure convergence, of such 
sequences. These problems will be labelled, respectively, the Central Limit Prob- 
lem (CLP) and the Strong Central Limit Problem (SCLP). Like all mathemati- 
cal problems, the CLP and SCLP are not static; as answers to old queries are 
discovered they experience the usual development and new problems arise. The 
development consists in (i) simplifying proofs and forging general tools out of 
the special ones (ii) sharpening and strengthening results (iii) finding general 
notions behind the results obtained and extending their domains of validity. 
Analysis of this growth will put in relief the role and the interconnections of the 
fundamental limit theorems. 

Summary. The growth of the CLP for independent summands can be divided 
into three (overlapping) periods. The first covers the Bernoulli case and the 
corresponding limit theorems of Bernoulli, de Moivre and Poisson. The first two 
theorems gave rise to the notions—from which the classical CLP stems—of 
the Law of Large Numbers (LLN) and of Normal Convergence (NC). Poisson’s 
approach belongs to the set-up of the modern CLP. 

The second period extends over two centuries and is devoted to the extension 
of the domains of validity of LLN and NC. This is the classical CLP period. 
Lyapunov’s crucial work, submitted to the above treatment, led to the discovery 
of the natural boundaries of these domains by Lindeberg, Kolmogorov, Feller 
and P. Lévy. 

However, the LLN and NC problems are but two particular cases of the 
general problem of limit laws of sequences of sums of independent random 
variables. The coming into sight and the solution of this problem—the third 
period of the CLP—covers less than ten years. The tools forged for the classical 
CLP proved to be powerful enough and the final solution is due to P. Lévy, 
Khintchine, Gnedenko and Doeblin. 


1 This paper was presented to the New York meeting of the Institute of Mathematical 
Statistics on December 27, 1949. 

Editor’s Note: The Institute of Mathematical Statistics has formed a Committee on 
Special Invited Papers to invite lecturers to deliver expository addresses to the Institute 
with the understanding that the Special Invited Papers are to be published in the Annals 
of Mathematical Statistics. This paper is the first one invited by the Committee. 

? This work is supported in part by the Office of Naval Research. 
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The CLP for dependent variables started with so called Markoff chains. 
The study of their limit properties is due essentially to Markov, S. Bernstein 
and Doeblin. For more general forms of dependence the LLN and NC problems 
were investigated by P. Lévy and Loéve after the crucial work of S. Bernstein. 
The modern CLP was considered only recently (Loéve). 

The SCLP stems from the strengthening by Borel of the Bernoulli theorem 
and the sharpening of Borel’s result by Khintchine. They gave rise to the no- 
tions of Strong Law of Large Numbers (SLLN) and of the Law of the Iterated 
Logarithm (LIT).’ The domains of validity were extended to their boundaries 
by Kolmogorov, P. Lévy and Feller. In the case of dependence, results are due 
to G. D. Birkhoff, P. Lévy, W. Doeblin, and Loéve. However, the SCLP has 
not attained, at present, the harmonious development of the CLP. 

Notations. Let £(X) be the law of a (real) random variable (r.v.) X. The law 
is defined by the distribution function (d.f.) F(z) = P(X < 2). Asis well known 
&(X) is determined by the characteristic function (ch. f.) 


2 


fu) = [ e'* dF(z), —-an <cu< +n, 


When a r.v. possesses subscripts, the same subscripts will be used for its d.f. 
and ch.f. EX will denote the expectation of X: 


EX = [ x dF(z), 


and o (X) will denote the variance of X: 
o(X) = E(X — EX)’. 


With a random event A we associate a r.v., to be called indicator of the event A, 
which takes values 1 and 0 respectively, according as A occurs or does not occur. 
If X is the indicator of an event A of probability p, then EX = pand o(X) = pq, 
where gq = 1 — p. To avoid trivialities we shall assume that pq ¥ 0. 

Two laws £(X,) and &(X_2) will be said to belong to the same complete type 
if there exist two numbers a ~ 0 and b such that P{X, S$ x} = P{aX.+ 6 S x}. 
If values of a are restricted to positive values, then the two laws are said to 
belong to the same type. If two independent r.v.’s obey £ and their sum belongs 
to the type of &, then £ and its type are said to be stable. Three classes of laws 
play an essential role in the CLP: the normal and the degenerate types and 
the Poisson complete types. 

Si(m, o) is a normal law if it is defined by 


1 . / ve= 
— | e262 (tm)? (s > 0). 
Co Qr — 20 





3 For a very thorough and deep analysis of the NC and LIT problems and their solutions 
see Fetter, Bull. Am. Math. Soc., Vol. 51 (1945), pp. 800-832, under the same title as that 
of the present paper. 
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L(m) is a law degenerate at m, if it attaches probability 1 to the value m. 
GX; a, b) is a Poisson law if 
k 


—r 


e A>0), k=0,1,2,--:; 


> 


P(X = ak+ 5) = 


| 


=~ 


a 


the familiar Poisson law is SA; 1, 0). 

A law &(X,,) is said to converge to the law £(X) as n — o, if F,(x) con- 
verges to F(x) at the continuity points of the latter. In this paper, all limits will 
be considered for n — ~, if not otherwise stated. 

The structure of sequences of r.v.’s whose limit properties are investigated 
will be called the limiting process of the problem. The limiting process of sequences 
of sums is that of sequences of the form S,,, = doit: Xn.x, Where v, — 0©.The 


Y 


— ' ‘ S ; 
limiting process of normed sums is that of sequences of the form — — b, with 


n 
S, = Dat X;, , wherea, > 0 and b, are real numbers. Normed sums are a special 
. X;, Ob S 
form of sequences of sums: take vy, = n, Xnx = — — —, then S,,,, = — — bn. 
an n an 


To avoid repetitions we shall note, once and for all, that limit types rather than 
limit laws appear in the case of normed sums, because, if 2(X) is their limit law, 
then any law of its type is obtainable as a limit law by a convenient change of 
origin b, and of scale a, , independent of n. The importance of the notion of 
type is due, primarily, to this property. In fact, even more is true: if &(X,) 
converges to &(X) and L(a,Xn + bn») converges to L(Y), then L(X) and L(Y) 
belong to the same type, provided neither is degenerate (Khintchine [20}). 


I. CENTRAL Limit PROBLEM 
2. Origin of the CLP: Binomial case. Three limit theorems are at the origin 
of the CLP; the first, due to Bernoulli ({2], 1713), laid the ground. Let S, be 


the number of occurrences of an event A of probability p in n identical and inde- 
pendent trials. Then, for every « > 0, 


p{) 5: - P| > bo. 


n 
Bernoulli found this result by a direct, but cumbersome, analysis of the be- 
havious of the binomial probabilities 





P\{S, =k} = Cip'g"’", k=0,1,---,n. 


Sharpening this analysis, de Moivre ((7], 1730) obtained the second limit theorem 
of probability theory which, in the form given to it by Laplace, states that: 
For every x 


(S, — np 


f a “ —431? 
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Suppose now, with Poisson ((36], 1837), that the probability p = p, depends 


—— , n , : 
upon the number n of trials and, more precisely, that p, = — , where A isa posi- 
n 


tive constant. Write then S,,.,, instead of S, , for the number of occurrences 
of the considered event in a group of n trials. By a direct analysis of the binomial 
probabilities, much easier to carry out than the preceding ones, it follows that 
fork = 0,1,---, 


Ny i 
P{S,... = k} +“. 4 ; 
Let X; be the indicator of the event A in the i-th trial. The number of ocecur- 
rences S,, is the sum ) f=: X; of n of these independent and identically distrib- 
uted indicators. The first two limit theorems mean that 


o (=F), 20 and 2 (S52) — m0, 0. 


n OOn . 


Thus we have two limiting processes, (both special and completely specified 
forms of normed sums), and two limit laws (more precisely two limit types, see 
introduction), a degenerate and a normal one. 

Poisson’s limiting process is utterly different. S,,, is still a sum )> MiXn. 
of independent and identically distributed indicators but, as n varies, all X,, 


change, P(X,x% = 1) = * and 


L(Snn) er Fr; i, 0). 


While the two first theorems with their special limiting processes and limit 
laws played a central role in the development of Probability theory, Poisson’s 
result stood isolated and ignored until about fifteen years ago.’ We shall see 
further that there was a deep reason for its isolation and also that, surprisingly 
enough, Poisson laws are, in a sense, more fundamental for the CLP, than the 
normal law. 

3. The classical CLP and its extension. From the time of Laplace until 1935, 
research in the domain of limit laws was centered about the extension to sum- 
mands other than indicators of the validity of the two first limit theorems. 
This is the period of the classical CLP: Let S, = >of: X. be sums of independent 
rv.’s. Find necessary and sufficient conditions for the LLN and for NC, i.e., con- 
ditions under which, respectively, 


LLN: & (= —**) + 2(0), 


S, — ES, 


NC: £& ( a(S.) 


) 20, », 


4In Uspensky’s textbook (1937!) Poisson’s law is mentioned once—in an exercise. 
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It is assumed that EX,’s and EX;’s exist. The d.f. not being completely speci- 
fied as in the Bernoulli case, the direct Bernoulli-de Moivre approach is of no 
avail and general methods are necessary. The first to appear was the method of 
moments relative to bounds of d.f. in terms of their moments (Tchebicheff [40], 
Markov [37]). The relation 


|S, — Wn “(S., 
pli HSI > bs SS) e > 0, 


together with 
o(S.) = Do oX), 

entails at once a LLN theorem (Tchebicheff-Markov): If 
= : o(X.) > 0, 


then the LLN holds. 
This result can be easily improved (bringing it into closer analogy with 
Lyapunov’s theorem): [f there exists a constant 6 > 0 such that 


1 
nits 


>, E\X, — Ex, |" +0 
k=1 


then the LLN holds. 

It contains then a Markov’s LLN condition: LLN holds if E | X, — EX; |'* < 
C where C is independent of k. 

In a much more elaborate form the method of moments gives also a NC the- 
orem (Tchebichefi-Markov): If EY’ — EZ* for k = 1, 2,---, and &(Z) = 
MO, 1), then L(Y) — DUO, 1). 

This theorem has been extended to more general limit laws. However the 
inherent defects of the method of moments remain. Even if moments of all 
orders exist, they do not necessarily determine a unique d.f. A definitive result 
in this direction is the Fréchet-Shohat theorem: If EY — m™ for all k, there exists 
a subsequence £(Y,,) which converges to a limit law & with moments m™. More- 
over, if the moment problem is determined, i.e., if the m“ determine a unique law, 
then the whole sequence L(Y) converges to &. 

To apply the convergence theorem to the NC part of the classical CLP, 
one has to assume existence of moments of all orders. In particular, it does not 
seem suitable for proving Lyapunov’s theorem. Yet, the simple truncation idea 
(Markov) not only overcomes this seemingly insurmountable obstacle, but also 
provides a method per se. It associates with the summands X;, ‘‘truncated”’ 
r.v.’s X; ; for k S n and c, conveniently chosen real numbers, 


X. = X: if 
po 





Xx | % Ca» 


0 if | Xi | > cr. 
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Nevertheless, the method of moments is too cumbersome and was soon to be 
discarded in favor of that of ch.f.’s. 

The turning point for the entire CLP is Lyapunov’s introduction of the 
method of ch.f.’s. The ch.f.’s were well known and used already by Laplace. 
However, the first convergence property, proved but not stated, is due to 
Lyapunov [28]: Jf the ch.f.’s g.(u) of 2(Y,) converge to the ch.f. eo ""” of NO, 1), 
then £(Y,) > QUO, 1). From it he deduced the first general NC theorem [28, 29]: 
If there exists a number 6 > O such that 


n 

as 2 E |X, — EX," — 0, 
then NC holds. 

The ch.f. became, in the hands of P. Lévy [21], a general tool, instrumental 
in the subsequent tremendous growth of the CLP, with the so called 

Continuity THrorem. /[f the ch.f.’s g,(u) converge to a function g(u) con- 
tinuous at u = 0, then L(Y) converge to a limit law & and g(u) is its ch.f.; and 
conversely. 

The methods of ch.f. and of truncation dominate at present the limit prob- 
lems of Probability theory. 

In spite of the generality of the above conditions for LLN and NC, they are 
not necessary conditions. In fact they are not sharp enough since they assume 
the existence of moments of higher order than those which figure in the classical 
CLP. However the tools forged proved powerful enough to get its complete 
solution. The truncation method yielded to Kolmogorov ({16, 1928) the com- 
plete answer to the LLN problem. A “smoothing” device, due to Lyapunov, 
provided Lindeberg ([20], 1922) with adequately sharp sufficient conditions; 
using ch.f.’s P. Lévy ({22], 1922) proved Lindeberg’s result and Feller ({11], 1935) 
showed that, under a natural restriction, these conditions are also necessary. 

Solution of the classical CLP. 

1. LLN holds if, and only vf, 


>> / dF. (a +EX,) — 0 and >» Es / x dF,(z + EX,) - 0 
|jz|>n |z|<n 


k=l k=1 70 


forr = 1,2. 


r 


; X;) , 
2. NC holds and max o(Xx) — 0 7f, and only 7f, for every « > 0, 


ks n o(S,) . 
n 1 


k=l o°(S,) |z| > €o(Sn) 


x dF (x oS EX;) — 0. 


An unsatisfactory feature of the classical CLP is the assumption, made at 
the start, of existence of certain moments. They are used to avoid, asn > ~, 
the shift, towards infinite values, of the probability spread by changing the 
origin and the scale of values of S,. However there is no specific reason for 
these special choices of norming quantities a, and b, except that, historically, 





LIMIT THEOREMS 


they appeared as a straightforward extension of Bernoulli and de Moivre ones. 
Moreover, even if these moments do not exist, there is no reason not to try 
to find norming quantities. (Take X;,’s to be independent and identically dis- 


= 3 
tributed as follows: to-+~/m where m = 1, 2, --- , attach probabilities —3—3 


The second moments are infinite; yet norming S, by c/n log n, we have NC.) 
Thus the CLP becomes the problem of the LLN and NC for general normed 


y 


Ba 
sums — — b,. 


an 

The extended classical NC problem wassolved, masterfully and independently, 
by Feller ((10], 1935) using ch.f.’s and by P. Levy ((25], 1935) who applied the 
method of truncation. The extension of the results to the more general set-up 
of the following section is trivial and will be given there. Feller also solved 
({11], 1937) the extended LLN problem. 

In this new set-up a question arises at once. Given the r.v.’s X;, do there 
exist numbers which will produce the desired convergence? If so, how can they be 
found? This problem is perhaps more difficult than the previous one and is 
specifically linked with the limiting process of normed sums. We shall give 
here a criterion, due to Feller ({10], 1935), which solves entirely the NC prob- 
lem. Take as origin of values of the summands their medians and let c,(e) be 
the g.l.b. of the z’s for which }of1P(| X.| > x) < €. Then norming quantities 
a, and b, such that & (>: ~ bn) — (0, 1) and max P ae ne o* > ‘ 0 


n ksn | An 


exist if, and only if, for every « > 0, 


1< 
+7 x’ dF (x) > ~. 
c;, (€) |z| <en(e) ul ) 


4. Modern CLP. At the same time that the classical CLP neared its happy 
end, a new and much wider problem of limit laws appeared and, because the 
necessary tools were at hand, was solved almost at once. Various particular 
problems, of which the classical CLP is one, contributed to its set-up. 

Since the discovery, in the Bernoulli case, of the LLN and NC, the problem 
of limit laws has been centered about extensions of their domains of validity 
for more and more general normed sums. A similar query about the Poisson 
convergence would have provided us with a new problem. As soon as we drop 
the restriction that in S,,, = Dore Xn. the r.v.’s Xn. are indicators, we are 
led to the problem of finding conditions under which laws of sums of inde- 
pendent r.v.’s will converge to a Poisson law. We have here not only a different 
limit law than in the CLP but also a more general limiting process. An utterly 
different problem, stated and solved by P. Lévy [21], is the following: find the 


5 As for the LLN, norming numbers, such that the LLN holds always exist whatever 
be the r.v.’s Xx. Hence, from the point of view of limit types of normed sums, the degen- 
erate type is to be considered as a degenerate form of every limit type. 
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possible limit laws of normed sums of independent and identically distributed r.v.’s 
(the answer is that they are the stable laws). For the first time one does not inquire 
about a completely specified limit law but about the class of all limit laws for 
a fairly general limiting process. Thus, starting with limit theorems with com- 
pletely specified limiting processes and limit laws, after two centuries of struggle 
Probability theory got rid of initial restrictions. 

The general set-up is now visible. The limiting process is that of sequences of 
sums of independent r.v.’s. The queries are about the classes of possible limit 
laws and conditions of convergence. However, so general a limit problem is 
without content. In fact, the limiting,process is that of arbitrary sequences of 
r.v.’s: let {Y,} be any sequence of r.v.’s and take X,3 = Yn, &(Xnx) = L0) 
for k > 1. Any law & belongs to the class of limit laws: take 2(Y,) = &. Hence 
some restriction is needed. To find a ‘‘natural” restriction consider the previous 
problems. Their common feature is that the limiting process is that of sequences 
of sums of independent r.v.’s, the number of summands increasing indefinitely. 
If we wish to emphasize this feature, a relatively small number of summands 
ought not to have a preponderant role in the determination of the limit laws. 
A “natural” restriction is then a requirement of uniform asymptotic negligibility 
(uan) of the summands, i.e., for every « > 0, P{ | X,x%| > €} — 0 uniformly in k. 
We come thus to the Modern CLP. Let Sz, = ate Xn.ky Yn — %, be sums 
of rvs Xn x, mutually independent for every fixed n, and such that 

max P{X,.%| > «} - 0; 
: 
characterize the class |D} of limit laws of the S,,,, and find necessary and sufficient 
conditions for convergence to any element of this class. 

The solution of this problem is essentially due to the results of investigation 
of random functions X(t) with independent increments. Let X(0) = 0, divide 
the interval (0, ¢) into v, subintervals (t,_1 , t;) with t+ = 0, and denote by Xx: 
the increment X (t,) — X(t,1). Then X(t) = >oi2, Xx where X,,, are independent 
r.v.’s. If, moreover, X(t) is continuous in probability for every ¢, i.e., if 
LIX(t + h) — X(t)} — LO) as h — O, then the X,,, can be chosen to obey 
the uan restriction as vy, — *”. Hence £{X(t)} might be expected to belong 
to {D}. 

The particular case of the modern CLP for summands and limit laws with the 
finite second moments was solved by Bawly [1], using Kolmogorov’s char- 
acterization of X(t)’s with finite second moments [7]. The general problem, 
thanks to a much more general result by P. Lévy ({24], 1934), was solved by 
P. Lévy, Khintchine ({20], 1937), Gnedenko ([14], [15], 1938, 1939) and Doeblin 
((8], 1938-1939). The method used throughout was that of ch-f.’s. (except in 
the case of Doblin who used also the P. Lévy ‘‘dispersion”’ function). 

One can avoid an explicit introduction of the considered random function 
X(t), limiting oneself to the corresponding (infinitely divisible) laws. For a 
very large n, Sn», is, roughly speaking, a very large number », of very small 
(in probability) independent summands. This leads at once to the consideration 


ye -_- ™ 


~ 
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of laws which possess such a property for any », and, first, the infinitely divisible 
(i.d.) laws. A law is i.d. if it is a law of sums of an arbitrarily large number of 
independent and identically distributed r.v.’s. In other words, f(u) is the ch.f. 
of an i.d. law if [f(u)]''" is a ch.f. for every positive integer n. One might expect i.d. 
laws to belong to {D} and, surprisingly enough, it turns out that, because of 
the uan, {D} contains only i.d. laws. 
We can now state the solution of the modern CLP, in three parts. Let 
+a a a 
{ = | + [. , let (x) be any function, defined and non-decreasing in 
_— a —a + 


+e 
(—«, —0) and (+0,+ <), with¢(— ~) = ¢(+ ~) = Oand { x do(x) < «, 


and let a and 6 be real numbers. 
I. The function f(u) ts the ch.f. of an i.d. law if, and only 7f, 


9 


, B° s ne a tux ) 
as ia ; —s 
log f(u) tau =“ + z (c 1 3 do(zx), 


-— 


and f(u) determines uniquely a, 8 and (x) at all the continuity points of the latter- 
(P. Lévy). 

Normal laws are obtained for ¢(x) = 0 and Poisson laws correspond to the 
¢(x) with one point of increase (x ~ 0) only. The fundamental role of Poisson 
laws appears clearly since, roughly speaking, an i.d. law is the convolution of a 
normal law and a continuum of Poisson ones. This role is further emphasized 
by the following theorem (Khintchine [20]): A law is i.d. if, and only if, it is 
the limit law of sequences of sums of independent Poisson r.v.’s. In other words, 
the class of i.d. laws is the closure of laws of finite sums of independent Poisson 
r.v.’s. ' 

Il. The class {D} of limit laws of the modern CLP coincides with that of 1.d. 
laws (P. Lévy-Khintchine). 

Together with I this result characterizes in an explicit manner the class {D}. 
An immediate question arises (Khintchine). What about the limit laws of normed 


sums? The answer is the following (P. Lévy [27]). Let y = log |2|, 
wily) = —o(x) forx < 0, yo(y) = (x) forx > Owhere y = log |x|. The limit 
laws of normed sums, under uan, are the i.d. laws with convex yx(y), k = 1, 2. 


In particular a Poisson law does not belong to this subclass {Dw} of {D}, 
hence cannot be obtained as a limit law of normed sums. This brings out the 
deep reason for the isolation in which the Poisson law remained as long as the 
limiting process was restricted to that of normed sums.® IT shows that, with 
respect to the possible limit laws, the limiting process of the modern CLP is 
definitely wider than that of the classical CLP and of its extension. However 
the entire class {D} can be obtained with normed sums, provided we consider 





* A problem, specific for normed sums, arises: given r.v.’s Xx, find necessary and suf- 
ficient conditions for existence of norming numbers such that the laws of normed sums 
would converge to a given element of {Dy} and, if they exist, find them. Feller’s NC cri- 
terion solves a particular case of this problem. 
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not only limit laws but also ‘‘accumulation” laws (P. Lévy-Khintchine): A law 
is 1.d. if, and only if, it is the limit law of a subsequence of normed sums of inde- 
pendent and identically distributed r.v.’s. 

I and II provided Gnedenko and, independently, Doeblin with the properties 


which allowed them to find conditions of convergence, thus completing the 
solution of the modern CLP. Let 


o2(X) = fs x? dF(x) — if . P are) | 


| 


denote a “‘truncated”’ variance of X. 
III. Under uan, £(S8,,,, — bn) converges, necessarily to an i.d. law “for a con- 


venient choice of b,.”’, if, and only 7f, 


(i) > Fix(x) — o(x) for x < 0, : [1 — Fru(x)] — —o(x) forz <0 


at the continuity points of o(x), and 


(ii) lim lim inf >> o2(X,..) = 6’. 
e>0 nn k=1 


In particular, since normal laws correspond to ¢(7) = 0, the NC conditions 
of Feller and P. Lévy follow: &(S,.,,, — bn) converges to DUO, 1) for a convenient 
choice of b, and uan holds if, and only if, for every « > 0, 


k=1 


a > / tise wt 0 Fditiad—~t 
|z|>e k=1 


The first condition shows that among all limit laws under uan, limit norm- 
ality corresponds to a sufficiently strong asymptotic negligibility of the sum- 
mands, and, more precisely, to 


vn 


» P(\Xu| >) 0, 
k=1 
or, equivalently, to 
P (max | Xx) > €) — 0. 
k 


Another illuminating characterization of NC (Raikov [39]) follows also from 

III. Take for origin of values of summands the truncated first moments 

/ x dF x(x). Then £(S,,, — bn) — DUO, 1) for a convenient choice of bn 
|z|<1 

if, and only if, (dvi X45)  L(1). 

5. CLP in the case of dependence. Limit problems for sums of dependent 
r.v.’s. were considered for the first time by Markov [87], less than fifty years ago. 
He extended the first two limit theorems of probability theory to the case of 
events linked in chain, i.e., such that P(Ax| Ai, +++ Asa) = P(Ax | Ac). 


4). 
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However the crucial work in this field is the celebrated memoir by S. Bern- 
stein ({3], 1927) which has the same historical importance for the dependence 
case as that of Lyapunov has for the classical CLP. 

Let {X;} be a sequence of r.v.’s. H’X;, will denote the conditional expectation 


n 
of X;,, given X,, --- Xy-1. Consider the sequence of sums S, = >, X;, with 


k=1 
EX; = 0 and let o, = Jt o (X,). 
k=1 
BERNSTEIN’Ss NC TuHeorem. If 
: n 1 = 9 > a) 
(i) — >> sup | E’X, | — 0, ii) +> sup | E'X; — EX; | — 0, 
On k=l on k=1 
and 
., Le ale 
(iii) — >> sup E’ |X. | 0, 
On k=1 
then 


@ (*) + N(0, 1). 


on 


Obviously, if the X;’s are independent, this theorem reduces to Lyapunov’s 
with 6 = 1. The method used is still that of ch.f.’s. From this result Bernstein 
deduces various particular NC cases and, applying them to Markov chains, ex- 
tends the latter’s results. 

The unpleasant feature of the above theorem is the use of suprema of condi- 
tional expectations and, except when the r.v.’s X; are bounded, one cannot ex- 
fect these suprema to be finite. On the other hand, the conditional expectations 
are r.v.’s and it would be natural to associate their values with the corresponding 
probabilities. This can be done and Bernstein’s theorem can be improved in 
various directions simultaneously. First it may be stated for sequences of sums 
S,»,—this is trivial; next it extends to 6 > 0 instead of 6 = 1—this contains 
completely Lyapunov’s result but is of secondary interest. Then NC can be re- 
placed by asymptotic normality, i.e., by the existence of a sequence of normal 
laws 91(0, on) such that the “‘distance’’ between L(S,.,,) and 9U0, on) would 
approach zero as n — »©—this is quite simple to get. However, significant im- 
provements are obtained on replacing suprema by expectations. Let F(x) be 
the d.f. of S,,,, and G,(x) be that of MUO, o%). Then, taking EX,, = 0, we have 
the following 

NC Tueorem. If (i) >> .£ | E’'X xx. | — 0, (ii) > | E’Xi. — EX | 70 
and (iii) there exists a constant 6 > O such that >» t| Xnz |*? — 0, then F,(x) — 
G(x) — 0. 

This theorem shows that, so far as moments of order higher then the second are 
concerned, the NC condition is the same as in the case of independence. In this 
last case the theorem is a slight improvement of that of Lyapunov. In 1941 condi- 
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tions for LLN and NC were given (Loéve [31], [32]) in the frame of the modern 
CLP, without assuming the existence of moments; when independence is as- 
sumed, they reduce to those given by Feller. Conditions for NC which in the 
case of independence, reduce to Lindeberg’s, were then deduced in the particular 
case of finite second moments and special cases of NC, including those con- 
sidered by S. Bernstein, were obtained. 

The whole modern CLP had not been considered until lately (Loéve, [33-35]). 
It appeared useful to extend the CLP to an ‘‘Asymptotic Central Problem” 
(ACP); primarily, to the behavior of &(S,,,,) asm — 2%. This in turn, led to the 
introduction of laws ‘‘in a wide sense,” i.e., with possible positive probabilities 
for infinite values. To the sequence {£(S,,,,,)} is associated another conveniently 
chosen sequence &,, of laws of sums; if &, — Lor L&, = L then the ACP reduce 
to the CLP. The investigation uses an extension of the P. Lévy convergence 
theorem for ch.f.’s and the modern CLP solutions are obtained as particular 
cases. The case of sums of a random number of r.v.’s,? as well as the multidimen- 
sional case, are easily treated by the same methods [35]. 

Many new problems arise in ACP. The foremost corresponds to possible 
relaxations of the uan condition. For instance, in the case of independence, the 
relaxed condition 


max P{| Xu — Yi. | > e} — 0, for every « > 0, 
k 


where Y;, Y2,--- are independent, does not change, essentially, the nature 
of the ACP. Yet, as soon as dependence is introduced, the whole outlook changes 
and it would be interesting to investigate various new possibilities which thus 
arise. On the other hand, stricter than uan conditions are of special interest 
when independence is not assumed. The one which seems natural is the following: 
max sup P’{| Xn | > «} — 0, for every « > 0, 

where P’(A,~) denotes the conditional probability of the event A,,%, given 
Xni,s‘'*, Xn x1. An immediate problem is whether this or an analogous 
restriction enables us to find, not only sufficient, but also necessary conditions 
for various convergences and various cases of dependence. 


Il. Tue Strona CENTRAL Limit PROBLEM 


6. The Bernoulli case and its extension. A sequence {X,} such that the corre- 
sponding sequence of laws converges does not, in general, determine a r.v. 
X which might be considered, in some sense, as the limit of XY, . However, if we 
define two r.v.’s X and X’ such that P(X # X’) = 0 as equivalent, then, when- 

1 


, / 1 ‘ , 
ever 2(X, — X,) — L(0) as — + — — O, the sequence {X,} determines a 
m n 


7H. Rossins (Bull. Am. Math. Soc., Vol. 54 (1948), pp. 1151-1161. studied in detail the 
case of independent and identically distributed X;’s with EX? < » and », , independent 
of X,’s, with EB, < «. 
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unique r.v. X (up to an equivalence)—for which P{| X, — X | > e} — 0 for 
every « > 0. This X is the limit in probability of X, . 

Yet, an observed sequence of values of {X,} need not converge to the ob- 
served value of X. For instance, let Y be a r.v. uniformly distributed over (0, 1). 
Consider the sequence {D,} of partitions of (0, 1) into n equal subintervals 
and to the k-th subinterval of D, attach the indicator X,,; of the event when 
Y falls within this subinterval. The sequence X,;; X21, X22; X31, X32, 
X33; °°* converges in probability to zero since P(X,, ¥ 0) = - , for 
k = 1,2, +--+ ,n, approaches zero as n —> «©. On the other hand, observed values 
of X,x’s, fork = 1,2, --- , n, will contain n — 1 zeros and a one, except in cases 
of total probability zero. Hence, except in these cases, any observed sequence 
will contain infinitely many zeros and infinitely many ones and will not converge. 


The Bernoulli theorem means only that f, = Sn converges in probability to 
n 


zero. Borel showed, in a fundamental memoir ([5], 1909), that Bernoulli’s state- 
ment is too weak, and, in fact, that observed values of f, converge to zero, 
except in cases of total probability zero. Borel’s proof is based upon a direct 
analysis of the de Moivre-Laplace approach to NC. Thus a new domain in 
probability theory was opened to exploration. 


First StronG Limit THeorem. Jn the Bernoulli case 


P{lim f, = p} = 1. 


This leads to the introduction in probability theory of the notion of almost 
sure (a.s.) convergence: 


X, +» X if Ptlim X, = X} = 1, 


or, equivalently, if for every « > 0, 
P\ | Xnuz — X| > e fork = 1, or 2 or --- ad inf.} ~Oasn— o~. 


If we denote by A, the event |X, — X | > e€, we see that we are concerned 
here with 
P = P (realization of infinitely many events A,) = lim lim P(A,4, U--- U 


nS pyro 


Ans»). From Boole’s inequality 


n+y 


P(Anuw U liad U Ant») = } » P(A;) 


k=n+1 


follows, at once, the fundamental BorEL-CanTELLI Lemma. If >on P(A,) < @ 
then P = 0. This lemma can be extended, using sharper inequalities (Loéve [32]). 


® Already Poincaré considered such probabilities in his investigation of ‘‘recurrence’’, 
and this, before the notion of completely additive measures was born. 
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Now apply the Tchebicheff-Markov inequality 
<4 |\X,—-—X|' 


é” . 


Piux.—-Xi>< r>0, 


and the ae criterion follows: if for some r > 0, >, E|X, — X|" < @ 


then X,, em: 2 
Applying it, with r = 4, to the Bernoulli case, Cantelli [6] obtained an almost 
immediate proof of Borel’s result. An even simpler proof is as follows: 
a.s. 


a. E\fn2— p\ < «since E(f, — p) = Pq hence fn: — p ——> 0. Moreover, 
n 


, 2 2 : : 
lf —f2#| 3s : for0 S v — nm S 2n, hence f, — frz — 0 in the usual sense, 


uniformly in v, and the theorem is proved. This last method applies as well to 
sequences of dependent events {B,}, which constitute a natural extension of 
the Bernoulli case. Let 


pi(n) = : . P(By), — pa(n) = a sail PB B.), 
nN k=1 


Cn isk 
5, = po(n) — pi(n) (in the Bernoulli case 6, = 0!). It is very easy to show that 
fn — p~i(n) — 0 in probability if, and only if, 6, — 0; this extends the Bernoulli 


theorem. Moreover, if n|6,| < C < © then f,— pi(n)—~+0 (Lodve [31)), 
and Dvoretzky [10] proved that it is enough to have > oe < o. Thus we 
have a simple extension of Borel’s result. 

The method used by Borel, while uselessly complicated in view of the result 
obtained, is very powerful! and, by sharpening it, the law of the iterated logarithm 
(Khintchine [18]) follows. 

SECOND StronG Limit THeorem. Jn the Bernoulli case 

P {tim sup ee. = i} = 1, 
on(2 log log o,)!!” 
where on = a(Sn). 

Let us use the following terminology (P. Lévy [26]). A non-decreasing se- 
quence {¢,} of positive numbers belongs to the lower class L, if the probability 
that S, < ¢,, from some n onwards, is 1, and it belongs to the upper class U 
if this probability is 0. The’ following criterion een applies: In the 


Bernoulli case {¢,} belongs to L or U, respectively, according as Din > F One "4S oe 
or < . Clearly this result contains the Khintchine’s LIT. 
7. The general case. The question of domains of validity of the obtained re- 


sults arises immediately and thus the SCLP appears in its present form. Let 
S, =) m1 X; be sums of r.v.’s X;, independent or not. Find conditions for 1° as. 


convergence of S or, more generally [31] of =, a, 7 «© (SLLN). 2° the law 
n 





hat 
ulli 
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of the iterated logarithm (LIT) and, more generally, criteria for classifying 
sequences {¢,}. 

The second problem, in the case of independent summands possesses almost 
complete solutions due, respectively, to Kolmogorov [17] and to Feller [13]. 

a. If sup |X; = o(¢,/ (log log ar) fork <= n, then LIT holds. 

b. Ifsup |X; | = O(¢,/ (log log —_— fork < n, then the criterion for the 
Bernoulli case continues to hold. (Feller also gave sharper criteria). 

In the case of dependent summands general results were obtained by P. Lévy 
[26] and for Markov chains by Doeblin [7]. The problem belongs (at present) 
to the domain of NC; it is complicated and pries deeply into the behavior of 
probabilities as n — «. Yet, in the case of independence, the dichotomy into 
classes L and U is more general as shown by the following property (P. Levy 
[26]). If {S,} is a sequence of consecutive sums of independent r.v.’s, and cannot 
be reduced by adding constants to an a.s. convergent sequence, then, for any given 
sequence {cn} of sure numbers, P(S, > c, for an infinity of values of n) = 0 
or 1. 

The SLLN problem seems easier. Nevertheless it is far from being solved; 
we don’t even know necessary and sufficient conditions for the SLLN in the case 
of independent summands in terms of individual d.f’s.’ The essential tools are, 
besides the fundamental Borel-Cantelli lemma, 1° the truncation method to- 


gether with the convergence in r-mean: X, —,Xifk |X, — X|'’-0(r > 0), 


> . 1 n 
2° the Kronecker lemma: Jf i x;./a, is convergent, then - 2 ant x — 0 


n 


(a, ©). It provides a possibility of transforming problems about the SLLN 
into those of a.s. convergence of series of r.v.’s, at least when sufficient con- 
ditions are sought for. 

In the case of independent summands one can start with the following prop- 
erty of series (Lévy [23]): a.s. convergence of >,1X:,. is equivalent to convergence 
in probability. (It can be shown that this property holds also for certain classes 
of dependent summands.) On the other hand, convergence in q.m. (r = 2) 
entails convergence in probability. Hence, when EX; < &, taking EX; as 
the origin of values of X;,, it follows that Jf 7. o (X,) <~, then S, a.s. con- 
verges. Kolmogorov proved this result using his celebrated inequality which 
considerably strengthens that of Tchebicheff: 

27.0 
P{max |S, | >e«} <° Sa) 
kon € 
This inequality has been extended by P. Lévy [26], and by Loéve [32] to de- 
pendent summands and conditions for a.s. convergence were deduced from it. 
If the EX; are not finite, the truncation method is applied. Put X; = X;, 
if |X,| < 1 and = 0 if | X,| > 1. Then (Khintchine-Kolmogorov) >., X,, 

* A first step in this direction is due to U. V. Prokhorov, ‘‘On the strong law of large 
numbers” (in Russian), Dokl. Ak. Nauk. Vol. 69 (1949), pp. 607-610. See also a paper by 
K. L. Chung to appear in the Proceedings of the Second Berkeley Symposium. 
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where X,, are independent r.v.’s, is a.s. convergent if,and only if, >. P(X, ¥ x3). 
7. o (Xx), > a (X%,) converge. 
It is not difficult to obtain conditions for series of dependent summands. 
+e 
Let gn(t) = P{|X.| > t}, & = [ a dF’,(x), where F,,(n) is the conditional 


df. of X,, given Xi, °°: Xn. If et tq,(t) dt < « for an e > O, then 
0 


2a (Xn — &.) as. converges. 
By using Kronecker’s lemma the results above yield immediately sufficient 


conditions for the SLLN. Those which come from the last one would in turn 
+€an 


yield without difficulty the following: Let a,7 2» and m, = | x dF’, (2). 


€ 1 n , : 
If a. Qr(ant) S q(t) and | tq(t) dt < «, then = z. (XX; — m) 2 0. 
0 n k=1 
Take now the particular case: a, = n’; and X;,’s independent and identically 
distributed. From the stated result follows: 


. Is, as. - 
1. If EX, = mezist, then — >> X, 2S im and conversely (Kolmogorov). 
nN k=1 
+a 
2.1f0<r<2,r#1,E|X.|' < © and lim xdF,(x) = 0, then 





. 7. X: ae (Marcinkiewicz). 


nile poy 





Other conditions for SLLN, in the case of dependence, are known (Lévy [27], 
Loéve [32]). 

The above result of Kolmogorov is a particular case of the celebrated ergodic 
theorem (Birkhoff [3]) which can be considered as a SLLN for a special case of 
dependence. Let A, be an event defined on the set {X;,,,--- Xx,} and 
let AS” be an event defined in the same manner on the translated set 
{Xi,im,*** » Xk, +m}. The sequence {X;} is called stationary if P(AS”) = P(A,) 
for every finite set {h;,--- , kn} and every finite m. The ergodic theorem states 


that If the sequence {X;} is stationary and E | X;,,| < «, then - ae Xi 


converges a.s.” 

However an unsatisfactory feature of Birkhoff’s theorem (and of its exten- 
sions) is that the conditions are not asymptotic—they have to be satisfied for 
every n and not for n — «—while the conclusion is an asymptotic one. Let us 
only mention that more satisfactory ones, at least from this point of view, 
which contain the previous ones, can be found. 


For about fifteen years Khintchine, Kolmogorov, Wiener, Yosida and Kakutani, 
F. Riesz, worked to simplify the proof of this theorem. It is only lately that its domain 
of validity has been extended by Hurewicz, by Halmos, and by Dunford and Miller. See 
also a forthcoming paper by the author in the Proceedings of the Second Berkeley Sym- 
posium. 
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The bird’s-eye view above of the SCLP shows that this problem is only in a 
tentative stage, perhaps because no adequately powerful methods or no ade- 
quately general approach to the problem had been found until now. 
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A RANDOM VARIABLE RELATED TO THE ISPACING OF 
SAMPLE VALUES 


By B. SHERMAN! 


University of Southern California 


1. Introduction and summary. Let x be a random variable with continuous 
distribution function F(x). Then y = F(x) is a random variable uniformly dis- 
tributed over [0, 1]. If 7, , x2, --- , 2, is an ordered sample of n values from the 
population F(x) then y,, yo, --* , Yn (ys = F(a;)) is an ordered sample of n 
values from a uniform distribution over [0, 1]. For n large it is reasonable to 
expect that the y; should be fairly uniformly spaced. Measures of the deviation 
from uniform spacing can be devised in various ways. Thus Kimball [2] has 
studied the random variable 


n+l 1 2 
e = p> (Fc — Fv) - an) 
where % = —© and 2n4; = +, conjecturing that a’ is asymptotically nor- 
mally distributed. Moran [3] has studied the random variable 


n+1 


B= dX (F(x) — F(ai))’, 


which differs from a only by the quantity —2/(n + 1) + (n + 1)”, and has 
proved that 6 is asymptotically normally distributed. Somewhat related to these 
two random variables is the quantity w” introduced by Smirnoff [4]. This is 


anf (FG) — FQ) aFO), 


although it is slightly more generally defined in Smirnoff’s paper. Here F*(z) 
is the sample distribution function ({1], page 325) of a sample of n values from 
the population with continuous distribution function F(x). The variable w may 
be written ({1], page 451) 


2 1 - 2i — 1\" 
= b+ d (Peo - 254). 
(22 — 1)/2n is the midpoint of the interval ((¢ — 1)/n, 7/n). Thus, if [0, 1] 
is partitioned into n equal subintervals then w” measures the deviation of the 
sample values y; = F(zx;), 7 = 1, 2, --- , n, from the midpoints of these in- 
tervals. Smirnoff has investigated the asymptotic behavior of w obtaining a 
rather complicated non-normal asymptotic distribution. 


1T wish to thank Professors J. W. Tukey and 8. 8. Wilks for their helpful suggestion 
and criticism. 
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It is possible to construct a definition of deviation from uniform spacing which 
permits a broader investigation than these random variables. This is 


« Sl res ~ rene = 
a, = 9 — Lv; a 37 =< n+1 ; 


where again % = — © and x,4; = + and F(x) is a continuous distribution 
function. (In Theorems 3 and 4 it is assumed additionally that F’(a) exists and 
is continuous except for a finite number of points). It is to be noted that 


0s, 21. 


Generally speaking use of the absolute value in circumstances like this is an 
undesirable procedure, but it turns out that w, is relatively easy to handle, al- 
lowing a fairly simple calculation of its moments (which are independent of 
F(x)). These are (u = min (k, n)) 


ae oe n+ k =i ge) n+1 in J a n+k 
Ank = E(w,) oe ( k ) dX (” +. ( 8 ) G4) ° 


Thus in particular the mean of w, is 


n n+1 
E (wn) = e* = 


and the variance is 


, 2 2 2n"*? tn x OF an+2 
Se) <862) + Fe) - STE ( n ) 


(n+ 2)(n4+ ID n+ 1 


2e—- dl. 
P&S ——————_ 
r n 
These results will be established in Theorem 1. From the moments the charac- 
teristic function of w, may be obtained, and indeed in finite terms. From the 
characteristic function the distribution function of w, may be readily calculated. 
The distribution function is written out explicitly at the end of Theorem 1. 
To determine the asymptotic distribution of the standardized variable 


Qn — E(w) 
D (wn) ; 
it is sufficient to examine the behaviour asn — 2 of the moments of this variable 
or equivalently the moments of the variable 


2 \1/2 
ne 
ae On — 
2e =a 


For it is easy to show that if the moments of the standardized variable approach 
the moments of a unique distribution function F(x) then the distribution func- 
tion of the standardized variable approaches F(x). In this manner it is proved 
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in Theorem 2 that the distribution function of the standardized variable ap- 
proaches normality. 
Since the asymptotic distribution of the standardized variable 


@, — E (wn) 
D (wn) 


is known it may be used as a test for goodness of fit if the number of sample 
values is large. Thus suppose 21, 22, ++ , 2» is an ordered sample of n values 
from some population and we wish to test the hypothesis that the population has 
the distribution function F(x). Then we calculate the quantity 


pep 13 [Ped — Feed - 45] - alo || = x 
|D(@n) L2 11°“ eT ee ” _— 
and if this quantity exceeds a certain value which depends on the level of sig- 
nificance at which we are working we reject the hypothesis. Let us say that 
P(X, > A) = B. The probability of rejecting the hypothesis when it is indeed 
true is then precisely B and this is small if A is sufficiently large. But suppose 
that the hypothesis is false and the sample values come from a population whose 
distribution function G(x) # F(x). Then we would desire the following property 
to hold for the random variable X, , namely, for any fixed positive A the prob- 
ability that X, exceeds A approaches 1 as n — o. For in this case (and when n 
is large) we are almost certain to reject the null hypothesis when it is false. A 
test for goodness of fit which satisfies this criterion, i.e. where the probability 
of rejection approaches 1 as nm — « when the null hypothesis is false, is called 
consistent by Wald and Wolfowitz [5]. We wish to prove then that the test for 
goodness of fit which uses the random variable X,, is consistent. To express the 
matter formally we wish to prove that (the probability density element of 
1, %2, °°: , Xn is n! dG(a,) dG(axe) --- dG(xn) in the region 
—2 <4 << +s €e <+O@ 
and zero outside that region). 
9 ec 
(_ 2 —(x2/2) : ai 
— [- , f dG (21) malt dG(2,) = { a é dx if F(z) = G(2), 
cee ae 1 if F(x) ¥ G(a), 
where D, is the domain 
—Oo <<< ++ KK $2, 
es E ¥ | Pe) - Pe) - | - Ble.) | > A. 
D(wn) 2 i=1 n+ 1| 


The first assertion here is proved in Theorem 2. The second assertion is equivalent 
to proving that for any fixed positive A 


no 


(0.1) ten | ns | dG(2,) dG(x2) «++ dG(ay) = 0, 
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where D, is the domain 


—o <4 <M << -+ KC Xn << +0, 

: 1 n+1 . . 1 | 
E (wn) = AD (wn) < 5 2 I (a) — I (a j_-1) —_ n+ 1\ < E (wn) + AD(wn), 

~ | 
when F(x) # G(x). Now D(wn) is of order n™?, E(wn) = e+ terms of order 
n and A is fixed. Hence it is sufficient to show that, if 2:, 2, --- , Zn is an 
ordered sample of n values from a population with distribution function G(z), 
then the random variable 


-_ (= 


: oe] 1 


hn 22 Pied - Pd) - —! 
Z i=] | ' n > l 
(it is necessary to draw a distinction between w, and Q, since F(x) 4 G(x)) has 
—1 : 2 ° 
a mean L, > L ~ e and a variance D’°(Q,) — 0. For then we have, when n is 
large enough so that the interval 


[E(wn) — AD(w,), E(wn) + AD(n)| 

falls outside [L — 3|L—e™"|,L+4|L—e" |jand|L,—L| <31|L—e"|, 
P(E(@n) — AD(on) < Qn < E(wn) + AD(wn)) 

< P(\Q,-—L| =4|L—e")) 

< P(\Q, — L,| =3|L—e")) 

< E(\2, — Lal) 


=~ 24/LZ—e1| =~ 
and this implies (0.1). 
But now in Theorem 3 it is shown that the mean of the random variable Q, 


is (writing k(x) = GF (x), k(x) a monotonic function such that k(0) = 0 and 


k(1) = 1) 
nynt+l 1 n 
I E = k (: + — a. k c) | dx. 


This expression approaches 
1 
—k’ 
[ ec dr 
0 


and this integral can assume the value e', which is its minimum relative to the 
class of monotonic functions such that (0) = Oand k(1) = 1, only when i:(x) = x 
ie. F(x) = G(x). Finally in Theorem 4 we prove that D*(2,,) — 0 and thus it is 
established that the test for goodness of fit based on X,, is consistent. 

2. Moments and asymptotic distribution of w, . 

THeoremM 1. Let F(x) be a continuous distribution function. If x, 22, +++ 5 Xn 
is an ordered sample of n values from the population whose distribution function is 
F(x) then the random variable 

1 


n+1 | 
@, =: dX F(xi) — F(vi-) a+ 1| 


, 
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where % = —© and tay, = +, has the moments 


oy ky (WEEN 'S (n+ 1\ (hk -1 a=)" 
ans = Ele) = ( k ) le s le 


where » = min (k, n). 
The probability density element of the x; is ((6], page 90) 


n! dF (a) dF (x2) «++ dF (xn) 


in the domain D,: — © < x < 4% << +++ <2%_ < + and zero outside of this 
domain. Then 


nw / om / wt dF (2:) AF (tts) «++ AF (tte). 
Dz 


If we make the transformation y; = F(2x;), a= 1,2, --- ,n, then 
1 n+1 1 
= i | --- : la 
ies nt | {| dy Ye Yea EG 
Dy 


where D, is the domain 0 < y: < yz < --- < yn < 1, thus indicating that the 
moments of w, (and therefore also the distribution function of w,) are indepen- 
dent of F(x). Here yo = 0 and yn; = 1. The transformation 


k 
| dy: dy2 +++ dyn, 


uM = Yi, Y= uM, 
U=Ye2-N, yo= ut tw, 


Un = Yn — Yr-1; Yn = UW +Ueters + Un, 
Un4i = Ynui — Uns Ynu = Un + Ue t+ ee $+ Unt Vay = 1, 


whose Jacobian is 1, then yields 


as 1 k 
Qnk nt f--f [3X we || ew dua +++ dt 
Du 


“f~fBEl-- ch 


1 n 
+5 


k 
5 ho ut met + | du +++ dn, 


where D, is the domain Zz u.<1l,u;j>0,7 =1,2,---,n. 


i=1 


_ ‘ » +] ° ° 
The domain D, can be regarded as the union of 2”"°—2 subdomains in the 


following way. First the hyperplane uw + w + +--+ + Un = n/(n + 1) divides 
thel domain into two parts. In the part of the domain below the hyperplane, 
i.e. where uj + Ue + --- + u, < n/(n + 1), we have a subdomain defined 
by the statement: i of the variables u; are greater than (n + 1)” and the 
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n 


, such subdo- 


mains and it is clear that, because of the symmetry in the u;, the intregal of 


residual group of n — ku; are less than (n + 1)". There are ( 


+1 k 
1% — 
3 Ui “. over each such subdomain is the same. There are alto- 


«= i=l 


n—1 
gether >. (":) = 2” —1 such subdomains. k ¥ n because of the inequality 


k=0 


UW + U +--+: + un <n/(n+1). In the part of the domain above the hyper- 
plane 


U + Ut: + un = n/(n+ 1), 


le. Where Uy + uw + --- + un, > n/(n + 1), the reasoning is exactly the same 
except that here / + 0. Thus we may write 


a—l /, n k . 
Onk = n! 2d (”) [--] = (, i" po ws) | du; dug +++ dun 
Dri 
“~ (n - 1 , 
+n! dX (") [| bP (u ~ 22 ‘) | du, dug +++ dun, 


Dr2 
where D,, is the domain 


. n 1 ; 
i me: i or = 1,2,>:- 
by OS a oF oi ’ 


1 
0< u< eT 
and D,» is the domain 


n = l 
ote <i. eee 
sai <*&" hel tS 


1 


If we introduce the variables 


we get 


Oink = nd (”) JJ (>, 2) ——— 
+B) {JG a 
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where A,; is the domain 


Da < >» 2; z>0 


i=1 i=r+1 


1 
em xe 
ssi > Se. > 
and A,» is the domain 


n n 


Ya<Da< 1 + DL %, 2#>O0 2, ++*58)s 


imr+l +1 © tarts 


a> (i sees mn). 
To effect the integrations with respect to the variables z , 22, --- z, we take as 
volume element in the r-space of 21, 22, --- 2, the volume between the hyper- 
planssa+at+see +2 =Ci,z4;>O0andatat--- +z2=C+4+d, 


yr—l 
z; > 0. This volume element is act = = fon dC. Thus 


n 


1/n+1 1/n+1 uiea™ cr n k 
oat = 0 (” )f | | a. (+) dena +++ dm 
r=0 0 0 (r — 1)! i=rtl 


n 


1/n+1 1/n+1 (1/n+1)+ . 7M cit 
oF J 
|  f \ sememwae EW dliiiy ++ diy 
r 0 0 > (r - 1)! 


2% 
t=rt+1 


1/n+1 1/n+1 1 -_ 
Tr 
[ we | = (Zr41 + °°: +2n) d2r41 -++ dZy 
0 0 r! 


1/n+1 1/n+1 1 


k+r 


1/n+1 1/n+1 1 
yf wal (k + r)(r — 1)! 


(Zr4a + e+ Hen)" densa +++ den. 


In order to perform these integrations we use the formula 


A A 
eae 
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which is established immediately by induction on n. Then 


SS (- 1)" (k +2)! ") n— ") ( q y~ 
au = nt SE r! st ( ‘eer 
owe (Hl ft (k+r—D! (n\ fn —r\ fit is 
+ ar eer OCT) GH 


i .(—1)"" 7 (k+r-—1)! n—r . 
myo (r— 1)! (r+hk)! (")( q )(<44) , 


The first of these double sums is equal to 


nik! < n-r—q@ (n\ (n—q\(k +r q \"** 
wm (CT VCE) A) 


EY EO EC 901) 


Let us assume first that n = k. The expression within the brackets is the coef- 
ficient of x"? in (1 — x)" “(1/(1 — x2)*™") = (1 — 2)” **" and this is £0 only 


when g = » — k and then it has the value : . ) Thus the first double sum is 











equal to 


n+k\" < ( k y(“)( q ~ 
Ct) B-see 
_(nt+k k\ (n\ (n —s\"*™* 
Nk =o \s/ \s/\n +1) * 
Similarly the second double sum is equal to 
r 4+ yi > k- ') n ) n— \ 
k = 8 sti/\n+1 : 
and the third is equal to 


VECO 


Thus, using the identity 


(‘) "y+ ("> *) ni) a “yet T’) 
s/\s s s+1 s—1)\sJ/ \s+1 s }’ 
we get 
_ fnt+ = n+ ') k— " n— \ 
—* k \s+1 s n+1/) ~° 


If however k > n then a similar argument shows that we get an expression for 
Onze Which differs from the above only in the upper limit of the summation, which 
is n — 1 in this case. Thus the theorem is proved. 
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The distribution function of w, is 


n—r—l1 q 
- 1 
rani E Bow (Nt) 
anit 2 BOI a) et 
(eta?) (2aay so” 
n n+ 1 n+ 1 , 
where r is the non-negative integer determined by the inequality 


en wt r+ 

soi ** sey 
F(x) = 0 when x S 0, F(x) = 1 when z 2 n/(n + 1) and F(z) is a polynomial 
of degree n in each of the intervals 


ne | a ; 
. Be. = } et 
G—4,4)): t ~ * n 

THEOREM 2. The random variable w, is asymptotically normally distributed 


(E(wn), D(wn)); 2.€., the distribution function of the standardized variable 


On — E (wn) 
D (wa) 





approaches 





1 z 
Ti [ eID ay 
27 Le 


It is sufficient to prove that the moments of the standardized variable approach 
the moments of the normal distribution. For in general it is known that if the 
moments a,x of F(x) approach the moments a, of a uniquely determined dis- 
tribution function F(x), then F(x) converges to F(x) in every continuity point 
of the latter (M. G. Kendall, Advanced Theory of Statistics, Vol. 1, Third edi- 
tion, Charles Griffin and Co., 1943, pp. 110-112). 


. . 1 2 2e —- 51 ; 
Now E(w.) > ; and D'(w,) ~ = a = = 4 so that the two vari- 
a ‘ ; 
ables" — E(on) and (") («. — ‘) have the same limiting distribution. Thus 
D(wn) c e 


i as 1 
it is sufficient to prove that the moments of (“) (o, _ *) tend to the moments 


of the normal distribution. In the following argument we take n = k sincen > o., 


CIO EO“CY 


2.1) nr? m! = 4 m k—1 (—1)""* nie” 


~ @e— 5)™2 | m! & 3 @+h)im—p! 


(ntil\fk- ') (2— sy" 
s+1 s n+1 ; 
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: m 1 2m 
Suppose now that it has been proved that z| (”) (., _ *) | tends to a 


finite limit as n — o, ie., that the limiting moments of order 2m exist, 
m= 1,2,---.If mis odd 


EL) (- 3) ] 
el I Yi )stelGy @- > 


. : n\ml2 i @Pr “ 
Hence, if m is odd, Al ( ) («. — is bounded as n — o. Now the ex- 
c e 


pression in the bracket on the right of (2.1) can be expanded in a convergent 
; " a ‘ ° . 
power series in n_ provided that n > m. Because of the factor n”/? and because 


; ‘ ; a 
the left hand side of (2.1) is bounded as n — o this power series must have — ; 
n 


m+ 1 


where p = = 


(since m is odd), as its initial non-vanishing term. But then 


the left hand side of (2.1) must approach 0 as n — «. Thus if the limiting mo- 
ments of even order exist the limiting moments of odd order are zero. We may 
now restrict the discussion to even order moments. 

Replacing m by 2m in (2.1) 


n\” hs _ n™(2m)! 1 
. (:) (.. 7 :) | ~ Qe — 5)" ear 
oS (—1)* nte* n+i1\ fk — 1\fn - s\""* 
° 2d 2, (n+ k)!(Qm —k)!\s + Hi s yc 4. ‘) |. 


Let us introduce the index g = k — s — 1 which runs from 0 to 2m — 1. 
Then 


E [ext _ y"| _ n™(2m)! | 1 
Cc - e ~ (2e — 5)™ | (2m)! 
2m—1 2m k k n+k 
(—1)‘nte* . n+ ‘y(' _ ‘y( —-k+qt ') | 
+ x fe, (n + k)!(2m — k)! (; —@¢ q n+ 1 


tet Sey So go], 


n”™ nmr 


: n\™ _ a 
In order for lim E ( ) («, _ ) [to exist it is necessary to show thata; = 0, 
c e 


n—2 


m 2m 9 ! 
t= 0,1,2,--- m — 1. Then lim El ") On — ‘) | = a al . If we de- 
n—>%0 Cc e (2e — 5)™ 
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- ° <a ° ° sail 
termine the coefficient a;, of n * in the expansion in powers of n~ of 


_ (—1)* nte* (; + ') f i ‘) 
k=g+1 (n + k)!Qm — k)! k-—q q 


2.2 
( ) P ie k + q + ‘ = > Dig 
° = — = coun 


n 1 i=g nt : 


we will then have 
i 

(2.3) a; = 2d Az, j = 1,2,-++m. 
aa 


It can be established at once that ag = 0. For if we set g = 0 in (2.2) and let 
—1)* 1 , 

— o the .2) has the limit = ~—... To determine the 
n «© then (2.2) has the lin E aa BIE! =m)! oO mi 
expansion of (2.2) in powers of n™ it is sufficient to focus attention on the expan- 
sion in powers of n~ of 


a n+k 
(n+ In) -+- (n—k + q +2) (s—Etet?) 
_@t)m)---a@-—k+q+2) (s—ktet ‘y 
(n+ k)\n+k — 1) +--+ 1) n+ 1 


or equivalently on the expansion in powers of x of the function 


1 1 1 (1/z)+k 
Ie Ree er 
- Lt x t 
EG) Et br 
x x x 


_ #(1 — a)(1 — 22) --- (1 — (kK — gq 2)z) ( —(k-@q- “ 
- (1 + 2x)(1 + 32) --- (1 + ka) l+¢2 
= 2" (dego + Anat + Aege + ---) = x'F(z). 
Here aig = e€ ** and the other coefficients may be obtained by a recursion 
formula. Thus: 


oes k)! 


Que = 5 D32 F(z) = =D D&25” [F(x)D log F(z)] 
7 > ( : ) Deze Peps log F(z). 
Dp: s=0 
But 


Dei» log F(x) = Di” (2 oe t) log (1 — (k — q — 1)2) 
k—q—2 k 

_ (: + t) log (1 +2) + >> log (i — iz) — dX log (1 + i) | 
Zs o é 


8 k — 
-s[@-q-oe (Hoe ao soso 


k—q—2 
+(-0'(1-k- 1) - a # «Ree ye = 8! Dice, 


s+2 i=2 
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so that 


1a /p-1 1S 
Akan = p! i ( ) (p — 8 — 1)! Aeg¢p—s-) S$! Digs = ? dX Akg (p—s—1)Digs - 


- g=0 Ss 


Of bigs We need merely notice that it is a polynomial in k of degree s + 2 and 
that big = -° k’? + Ak + B, where A and B depend on q only. We wish to 


determine the value of a;,:-,) and to this end we solve the system of linear 
equations 

Akad = re" 
—— 
- x Aig(p—s—1)Digs — Argp = O, pela-«---s-% 
Qig(i-g) 18 therefore a quotient of two determinants. The determinant in the 
denominator has the value (—1)*“ while the determinant in the numerator 
can be expanded by its last column and is therefore the product of (—1)'%e**? 
and a determinant B,,; whose entries dag, a, 8 = 1, 2,--- 7% — gq, can be de- 
scribed as follows. If 8 > a + 1 then dag = 0. dacai1) = —1 and when 6 S a, 
das = beg(a—s) , 2 polynomial of degree a — B + 2. Thus diqci-g) = €  “Bigi . 


The determinant B;,,; is a polynomial of degree 2(i — q) in k and the term of 


this degree comes only from the product of the diagonal elements. For 
i—a 


Bigi = | dag | = © + |] dacca) Where (a) S a + 1 and (o(1), o(2), --- c(t — q)) 
q L Y 


i-q 


is a permutation of (1, 2,---i — q). The term[] dasa) has degree 


a=1 


1—q o“@¢@ 
> (a — a(a) + 5(a)) = >, d(a) where d(a) = 2 if (a) S a@ and (a) 
a=] a=1 


i-q 


if o(a) = a+ 1. But 7 5(a) = 2(4 — q) @ (a) = 2 ola) FS aeal(a) =a, 


a=] 

so that it is the product of the diagonal terms and only that product which gives 
to the term of degree 2(¢ — q) in the expansion. Thus 

1 


™ 1 (- yn jr 4 > 4 J 
(i — q)! : , —_— 


We are now in position to evaluate aj, . 


2m kk 
—1 z= % 
Aig == Zz. ee o'r ( Jaina 


k=qgt1 (2m — k)'(k — gq)! q 


2m (—1)*e? _ ') | 
x, (Qm — k)\k — g)! q Brgi 
ot gall (- > -  i-pe? _ P ‘ 
(¢ — q)! - k=qti (2m — k)'\(k — gq)! q 


2m (—1)*e? (‘ — ') l > | 
—- - A;k’ |. 
ai ot (2m — k)\(k — gq)! q 2 , 


Bui = (bigo)’ “ + terms of lower degree in k 
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To complete the evaluation of a;, we observe that 


i “a (—1)*k' “* if 1 = 2(m — q), 
=, Qm—bik—o! 
oe Te = Se — if 1 < 2(m — 9). 


(2.5) implies that aig = 0 if i < m and therefore a; = 0 if 7 < m. The proof 


1-1 , 

of (2.5) is brief. We note that ko ' = >> ¢; : : ’) , where c; is independent 
i=0 

of k and cj, = (l — 1)!. Then 


x (—1)*k' (' - ') _ s (—1)'k'"k! 1 r ~ “) 
keagti (2m — k)'(k — gq)! qd raqti Qi(k — q — 1)!(2m — g)!\k — @ 
l—1 2m 


> » (-1)' pa ‘e = “)(' +) 


j=0 k=qtl (2m — g)!qghk-—q—D!\k—-@q j 
l 


Pte t Ml Cr k+j )]. 
2, (2m — qg)!7!q! fa, - 1) +s? i 


The expression within the brackets is the coefficient of 2” *" in 
2m— 1 m—2q—j—2 

(l— zx)" a zie = (1 — 2)” **** and this is zero if 7 < 2(m — q) — 1 

and 1 if 7 = 2(m — q) — 1. Accordingly 


Ce (—1)*k’ k—1 
Aiea Teh 7 


0 if 1 = 2(m — q), 


[2(m — q) — I!2R(m — gq) —-1+q4+1}!_ 1 " 
Gm — gi2@ — @) — Mig! "a = ee 


and (2.5) is established. Returning to (2.4), aig = 0 when z < m, while 


a (- 3)" " (") (- :¥ t. 
as (m — q)!q! 2 m! \q 2 F 


and now applying this expression to (2.3) 


— 1 fm oo 2 1 m 
cS -— — = = —— (2e — 5). 
™ 2 m! ”) ( :) _ m!2™ (Qe 5) 


1\""] _ am(2m)! _ (2m)! 
tim £| (*) («. *) | ~ (2e— 5)! mit2m’ 


and these are precisely the even order moments of the normal distribution. 


n\1!2 ‘= ; . Wn — Ean) 
Thus () («. ‘) is asymptotically normal and so is Dio.) ‘ 
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The skewness 6; = (*) and kurtosis B2 = S of the standardized variable 
o oO 
Wn — E(wn) 
D(wn) 
356 


2 ~ 2 2 
a (6e° — 42e * 70) + O(n) =: ot“ O(n), 
n (2e — 5)? n 


1 24e° — 336e> + 1368e — 171 os 1.05 --- - 
he 0 tut. +eer* 
n (2e — 5)? n 


are 


By 


3. Consistency. According to previous discussion in order to prove the con- 
sistency of the test for goodness of fit based on the asymptotically normal 
variable 2 it is sufficient to show that, if 7,22, ---+ , 2, is an ordered 
sample from a population whose distribution function is G(x), then the limiting 
mean of the random variable = >> | F(x,;) — F(xi-1) a. | 

2 inl n+ 1 

e if F(x) # G(x) and the limiting variance of this variable is zero. This is 
the content of the next two theorems. In connection with these theorems it is to 
be observed that, when y = F(z) is continuous, F'(y), 0 < y S 1, can be 
defined unambiguously by writing F-'(y) = [Sup x: y = F(x)] except for y = 0, 
and F-'(0) = —~. The function k(x) = GF '(x) is then a non-decreasing 
function mapping [0, 1] into [0, 1] and such that /(0) = O and k(1) = 1. Now 
if F’(x) exists for all but a finite number of points and is never zero then F‘(z) 
is continuous and so is (x). If further G’(x) and F’(x) exist and are continuous 
except for a finite number of points then (F’(x) # O)A’(x) enjoys the same 
property. These remarks justify the substitutions and partial integrations that 
are effected in the course of the next two theorems. 

THEOREM 3. Let F(x) and G(x) be continuous distribution functions whose 
derivatives exist and are continuous except for a finite number of points. If 
21, Lo, °** Xn is an ordered sample of n values from the population whose dis- 
tribution function is G(x) then (k(x) = GF '(x)) 


n+1 | 


.s E 1 1 ) k( |" ' f amie 
ad | -k(o+ rie) me 4 dx. 


is not equal to 


1 
‘i a ‘ ‘ ; 
The integral | e'® dx has, relative to the class of monotonic functions such that 
0 


k(O) = 0 and k(1) = 1, the minimum value e' and assumes that value only when 
k(x) = zie. F(x) = G(a). 

Let us suppose first that F’(z) # 0. Then F~’(zx) is continuous and it is dif- 
ferentiable at all but a finite number of points as is also the function 


GF (x) = k(z). 
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1 n+1 


528 (xe — Flea) — 1 


1 5 . 1 
5 B (|r) - 4|)+3 B(|1 - Fe) - + ,/) 


+58 (\Fe — F(a) — 4). 


The joint probability density element of 2;_; and 2; is 
n! i—2 n—i 
(i ee 2)! (n a i)! G(a i-1). (1 —_ G(x;)) ; dG(xj-1) dG (x;) 


in the domain — x < 2,1 < x; < + and zero outside that domain. Hence 


1 
- = E( P(e) — Fea) — 5 ) 


2 i=2 
sal f F(xi) — F(a) — Hl 
n! 
1 ‘ 1 
5" n(n — 1) [ Fan) — F(X) — | 
- {1 — G(Y) + G(X)]"” dG(X) dG(Y), 


and making the transformation y = F(Y) and x = F(X) the integral on the 
right can be written 


, G(r) Wey .. G(a))"" dG(xi-1) dG(z;) 


| 
yn(n — 1) [ft ly—-x2— — [1 — k(y) + k(@)J"* dk(x) dk(y) 


int [ f (: — 1) [1 — k(y) + k(a)I""? dk (2) dk(y) 


y—(1/n+1) 
+ n(n — 1) ff (v —z-— 5) [1 — k(y) + k(x)]"* dk(x) dk(y). 


Integrating partially with respect to x, the expression on the right becomes 


nf? 1 y nf? 1 eli 
5h api ew 5 f (-»+=45) [1 — ky)" dy) 


-" 3 I [ [1 — k(y) + k(x)]"* dx dk(y) 


-nf- (- +4) [1 — k(y)J" ak) 


n 
— (1/n+1) 
taf tnt l [1 — k(y) + k(a)]"™ dx dk(y), 
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and now integrating with respect to y 


| 1 lf 7 
:2 E (|rG — F(a) - me 4))- “ey + sf [1 — k(x)]” da 


1 1 n[{[n+l 
+ - [ k(x)" dx — , {1 — k(x)]" dx — [ k(x)" dx 
2 Jo 1/n+ Jo 


n/n+1 r 
+ | fi-k(x+, a) + eo | ae 


The other two terms in (3.1) are treated similarly. The probability density 
element of x; is n(L — G(x,))" dG(a) so that 


1 ce n— 
1 (Fee) -4)-3 - F(x) — 4 |a- ea» ' dG (x) 


1 
5h, |7- a (1 — k(x))"™* dk(a) 
0 

1 1 piint s 
map 32, Vere 
1 
+ ; (1 — k(a))" da, 
2 1/n+1 

Similarly we find that 


1 ol oe 
5B (|1- Few i) 2(n + 1) 


1 n/nt+l 1 1 
+ = | k(x)" dx — = / k(x)” dx. 
Z 0 2 1/n+l1 


E(Q,) = ~ E —k (: + 4) - k¢a) [ae 


This result is, however, independent of the hypothesis F’(x) ¥ 0. For if F’(x) 
is sometimes zero we may select a sequence of distribution functions F,,(2), 
m = 1, 2,---, which converges everywhere to F(x) and which is such that 
F.(x) # 0. The F,,(x) otherwise satisfy the conditions of the theorem. If Qna 
is that function of 2, x2, +--+, % obtained by replacing F(x) by F(x) in Q, 
then Qn. converges to Q, hee every fixed set of 7, %,-°-: , 2%, and E(Q,,,) con- 
verges to E(Q,) since both Qn,» and Q, are hemmed by L. Furthermore if 2 
is any value such that F’(a%) ¥~ 0 and y = F(a) then Fa’ (yo) converges to 
F (yo) = x. For if a, is a cluster point of the set F;,'(yo), then there exists, 
for a given e, a sufficiently large m such that | F(a.) — Fm(x) | < € (because 
F(x) — F(x)) while, for the same m, | F'n(11) — yo| < € because of the con- 
tinuity of F,,(z). Thus | F(a) — yo| < 2e and, since e€ is arbitrary, 

= F(a) = F(x). So 2, = 2 since F’(a) ¥ 0. Thus F;,'(y) — F'(y) for any 
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value of y such that if x is mapped into y by F(x) then F’(x) # 0. This set 
on the y axis however includes all y except for a set of measure zero and so 
F,.(y) —~ F-‘(y) almost everywhere. So kn(y) = GFn'(y) ~ GF'(y) = k(y) 
almost everywhere and 


1 cael ail 1 oor 
E — ke (y + 1) + im) | 1 I (u + 4) + iw) | 


almost everywhere. Then 


n/n+1 1 n 
I l —_ | (: + — od k ~(o) | dx 
n/n+1 1 n 
=| [i—k(2+ 1) +e] ae 


since both integrands are bounded by 1. Therefore the equality 


n|/n+1 1 n 
E(Qmn) = | [1 — kn (« + i) + kta) | dx 


is preserved as m —> ©. 


Now k(x) is a monotonic function and hence has a derivative almost every- 
where. Then 

















ee 
1 n 
E —k (« + one :) + K(a) | 
1 1 1 e 
=| 1-4 (k(2+ 4.) - ew /. 
l ay (e(2+ 45) ne —)| 
converges to e “” almost everywhere. If we write 
1 n 
whenO S28 : : i and H,(x) = 0 when —7 i <x = 1, then 
«) 1 n{n+1 1 n 1 : 
t), | H,,(x) dx = I E —k (« +1) + (a) | ax [ eo ® de 
rat 0 0 n+l 0 
bmn asn — ©. The curve y = e * lies always above its tangents and the tangent at 
Qn ; 1 2 wie 1 ,2 , 
= = —-7T — Po —_— —_ 
on z=lisy oo + e" Thuse~ 2 oe + z for all x, equality holding only 
a ‘ 2 . , 
“ when « = 1, and therefore ¢“’” = —* ka) = * equality holding only 
Sts, when k’(x) = 1. 
use So 
on- : —k! (2) ‘fF 2 
z ee. ein Jv mt 

ry, [e dx = oh K@det-, 
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equality holding if and only if k’(2) = 1 almost everywhere. But for any mono- 
tonic non-decreasing function 


1 
[ v@ ar sk) - KO), 
0 
equality holding if and only if /:(x) is absolutely continuous. Hence 


[fe deze —[ w@art?z = 


0 e 


and the equality runs through if and only if /(x) is an absolutely continuous func- 
tion such that k’(x) = 1 almost everywhere. But this is true of k(x) if and only 
if k(x) = x and this in turn is true if and only if F(x) = G(2). 

THEOREM 4. The random variable Q, has limiting variance zero; 1.e., lim E(Q) = 


1 2 no 
—k’ 
| f er az). 
0 


As before we assume first that F’(x) ¥ 0. Then 


E(@) = E (5 > |F (as) — Fa) — a i) | 


+E || Feed - 5 ia. il 


1 /| 1 F 


Suppose [Sup x: k _ 0] = aand [Inf x: k(x) = 1] = b. We may then obtain 


lim £||F(a) — 7 + a Q, | in the following manner: 


[peo] on = 2 
{jroo — al 


= E (Fw * 4 - a) yr [E(0%,)}"". 


But 2, < 1 so that E(0%) is bounded as n — o. On the other hand 


E (Fo - 4 — a) = nf (Fie _ 5 _ a) (1 — G(x))"* dG (x) 
1 1 2 ts 
= n| (: —-a- 4) (1 — k(x))"~ dk(z) 


a ¥ 1 
-(«+4,) +[2 (--a- 1) aia” dx. 
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As n — © the expression on the right tends to a + | 2(x — a) dx = 0. 
0 


Thus the expression on the right of (4.2) goes to zero as n — © and therefore 


(4.3) lim E | F(a) — 


no 





a, | = lim # {004 = a [ e*® dy. 


n->cO 


a 


In a similar manner we obtain 






; | 1 








and 





1 
o,| = (1— —k’ (x) : 
| ( ») |e de 
' Ee es 1 1 


oi) | 


= —i@+1-b)’ 


The first term on the right of (4.1) remains to be investigated. We have 


1 2 
e[(bx ¥ | Feed - Fees) ~ + a) 


1 . iy 
(4.6) = 4 E | (Fees = F(x) ae 4) | 










































1 n—2 n | 1 
in + 52/2 2, |F@d — Few) - || Feo - F(es3) — || 
+25 | Pd — Few 1) — : F(2ixns) — F(x) - ; 1] 
9 a i i— n+ n+ 1 i+ z n + 1| : 
The joint probability density element of x; and 2; is 
n} av s i—2 \n-i ; 
G—2)!m—a! (1 G(x;-1))"~ G(x) dG(xi-1) dG(x;) 
2, so that 
52 |X (Feo - re oe +) | 
4° Lis ”" n+1 
a1) 1 . " 1 r 
= Zn — 1) I (Fay — F(X) - — i) 
—w<x<¥<o 
- [1 — @(Y) + G(X)]"* d@(X) d@(Y) 
dx. 









1 - i \ lh ae 
=ina-vof [ (u -2-—1 :) 1 — ky) + Ka)" dk(z) dk(y). 
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In this latter double integral we integrate first with respect to x and then with 
respect to y obtaining 


—n— 3 1 ie 1 - a 
4(n + 1b? ti (v a n+ 4) [1 — k(y)]" dy — | (. + i-2) k(x)” dx 


[[ a -%@ + r@r ae ay, 


0<z<y<l 


and proceeding to the limit 


n , 1 2 
tim 42 (Fie — F(x) - —)| 
a 1 
—sfuwy-Sfa-nats ff aca 
2 Jo 2 Jo 2 


0<z<y<l 
k(z)=k(y) 


-70- +5 // dx dy. 


0<z<y<l 
k(z)=k(y) 


The joint probability density element of x1, 2;, 4j1, 4; Wwhenj > 7+ 1is 


n! 


GHG =i — Hie — Hi FO WE») — Gea" 





[1 — G(x"? dG (xi-1) dG (x;) dG (x;~1) dG (x;), 


52[5 > | Fes — F(a) - ition | F(2;) — F(a) — ik | 


i=2 jmite | n+1)| n+ 1] 


* 5 n(n ~ ie -2De ~ 2 fl FY) — F(X) — — 


0<X<Y<U<¥ <1 


+1 
— G(Y) + G(X)]"“* dG(X) dG(Y) om dG(V) 


= 5 n(n — 1)(n — 2)(n-— 3) Mt. e- ot |lo-w- 4 


sipaeawents 


(4.8) | FV) ~ FW) ~ a 1 —G(V) + GU) 


- [1 — k(v) + k(u) — ky) + k(x)J"* dk(x) dk(y) dk(u) dk(v). 
The joint probability density element of x;-; , x; , 2i4: is 


n! 


G@—2)\n—i-— 1)! G(xi-1) ‘iia [1 —G (ei) dG (xi-1) dG(x;) dG (x11) 
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and so 
1 | | 1 
2 Le F(z) = F(@in) - n+1 | FQ) — F(a,) _ sei | 
a lp , 1 | 
=5nn—dm—-2) fff |ra-FrH- a 
-<narae “ 
(4.9) - FW) — F(Y) - ai! | [1 — G(V) + G(X)]"* dG(X) dG(Y) dG(vV) 
1 
= 5 n(n — 1)(n — 2) il ly-2-—4 | 
7 0<z<y<r<l 
\vu-y- a fl — kv) + k(x)]"~* dk(x) dk(y) dk(v). 
We intr ies the symbol S(p, q) as follows 
[a it 
qSapt+— 
| = : 
S(p,q) = 3 
ps ifq> p+ ——. 


Then in the integral on the right of (4.8) we perform a partial integration with 
‘si to wu and add to the integral on the right of (4.9). We get 


snln — in ~2 ) fff sll e- 1 


0<zr<y<r<l 


; — k(y) + k(a)]"~ dk(x) dk(y) dk(v) 


-lae-no-9 iff s0|y-2-—| 


0<z<y<ucr<l 
- {1 — kv) + k(u) — ky) + k(a)]"> dk(a) dk(y) dk(v) du, 
and now integrating with respect to v in the triple integral and performing par- 


tial integrations with respect to x and collecting terms the sum of (4.8) and 
(4.9) becomes 





ec we — 1 1 | = 2n(n — 1) 
An +1)? 24+ 1 lly - ai |b — BOI” ak) - ==> 


S(z, y{l — k(y) + k(x)]"* dx dk(y) + 4n(n — 1) 


0<z<y<l 


[lf Seo0|y- 2a 2) +600 - RI 


= 1 
| 
0<y<u<er<l 


- dk(y) dk(v) du + 3n(n — 1) 
If S(u, v)S(x, yl — kv) + k(u) — ky) + k(x)" dk(y) dk(v) da du. 


0<z<y<u<cr<l 
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Now some tedious, although in principle straightforward, calculations show 
that the first three terms of this expression approach 


1 
(4.10) —-i— ig —}(1 — 5) + | er ae 
0 


that the triple integral approaches 


1 
(4.11) 1g + 1a(1 — 6) + 10° — a | cr ae, 
0 


and that the quadruple integral approaches 


1 1 
2 [/ e Ok O de du — [ e*'® dz — (1 — b) | e'® dy 
0 0 


0<z<u<l 


(4.12) 7 ; / dx du + (1 — b)? + 40(1 — 6) + 4. 


0<z<u<l 
k(z)=k(u) 


Thus collecting the results of (4.3), (4.4), (4.5), (4.7), (4.10), (4.11), and 
(4.12) we have 


- a8 =k! (2)—k? (u) 
lim E(@2) = 2 | | gO Oe dy, 
no 

0<z<u<l 


Since the integrand is symmetrical in the variables u and x we may write 


1 2 
(4.13) lim E(@2) = I eek ae dy = | f gk ax |, 
0 


no 
0<z<l 
0<u<l 
and this proves the theorem in the case F’(r) ¥ 0. 

Using the procedure of theorem 3 we may however extend the theorem to 
include the possibility that F’(x) is sometimes zero. But it must be shown 
additionally that the sequence F,,,(x) can be so chosen that Q,,.. converges to Q, 
uniformly in n, i.e. that, for a given €, | Qnn — 2, | < efor m sufficiently large 
and for any value of n. If this is true then, observing that 0 S$ Q,,, + 2, © 2, 
| Ban M%, | < 2e and 


| E(Qan) — EQ) | > E( | Qian — %, |) S 2e 


independently of n. Letting n — « 


1 2 
| | | co? ax | — lim E(Q?, | < 2, 
0 no | 


and nowletting m — « (theF,,,(2) constructed beloware such that ken, (2) — hk’ (x)) 


1 2 
| | [ cor" az | — lim E(Q?) | < 2. 
0 no 
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Since ¢ is arbitrary this implies (4.13), so that the theorem is extended to include 
the possibility that F’(x) is sometimes zero. That the sequence F’,,(x) can be 
chosen so that Q,,, converges to Q, uniformly in n can be shown as follows. The 
set of points on the x axis for which F’(2) = 0 maps into a set of points on the 
y axis of measure zero. For any m we may enclose this set on the y axis in an 
open set S of measure less than -. S is the union of disjoint open intervals 
S;,7 = 1,2,---. The sets 7; = F*(S,) on the x axis are disjoint open inter- 
vals. Now we may construct a distribution function F,,(x) which coincides 
with F(x) outside 27; , is such that F S (x) ¥ 0, and otherwise satisfies the condi- 
tions of the theorem (stated explicitly in Theorem 3). The sequence F n(x) con- 
verges to F(x). Furthermore 


| Qnn — Qn | 


\] n+1 


DS | Fe) — Flee) - : |-3 > Fike) =~ Pela) ~ — l 
i x; vi-1 3 _ m\X; m\Vi-1 ssi | 


a i=l n+1 
|| reo - Fee - 15 | - | rated - Paleo - J 
= || Li Vi-1 n+1 m\ Xi m\Li-1 n+1 


1 n+1 


; i | (F(a) — F(a] — [F(x — Fn(xi-s)) |. 


For any particular set of values of 21, 22, --- 2, some (possibly none or pos- 
sibly all) of the x; will fall into intervals of the =7'; . If this finite set of intervals, 
each containing at least one z;, is say T,, T:,--- , T;, then a simple analysis 
of the sum on the right of (4.14) shows that it is less than twice the total length 


of the intervals F(T,), F(T:), --- F(T,) and this total length is less than =. 


1 : 7“ 
Thus | Qa. — 2 | < me and this result is independent of n. 
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ON A PROBLEM IN THE THEORY OF k POPULATIONS’ 


By Racuu Rasy BAHADUR 


University of North Carolina 


1. Summary. In two recent papers, Paulson [1] and Mosteller [2] have called 
attention to several unsolved problems in k-sample theory. A problem which is 
typical of the ones considered in this paper is as follows. 

Let 7, m2, °°: , 7 be a set of normal populations, 7; having an unknown 
mean m; and variance o , G(x, 6;) being the distribution function which char- 
acterizes 7;. Samples of equal size are drawn from each population, X; being 
the sample means, and S’ the estimate of o° obtained. The problem is to construct 
a suitable decision rule d = d({X;}; S°) to select one or more populations, the 
object being to minimize the expected value of the random distribution function 


G(x | s(d)) = : Z(d) - G(x, 6,) / x Z.(d), 


where Z;(d) = 1 if 7; is selected by d, and = 0 otherwise. It is shown that under 
the restriction of impartial decision, the rule d, = ‘Always select only the popu- 
lation corresponding to the greatest X,’’ cannot be improved, no matter what z 
or the true parameter values may be. It follows (i) that d; is the uniformly best 
decision rule in the class of impartial decision rules for all weight functions of type 
W = max {m;} — (& zim: / > 2), 
i i=1 i=] 
and (ii) that the customary F and ¢ tests of analysis of variance are not relevant 
to the problem. 

This result is an application of Theorem 1 which applies to a number of similar 
problems concerning k populations, especially when the populations admit 
sufficient statistics for their parameters. Two examples of statistical applications 
are given in Section 6. 


2. Introduction. It has been recognized for some time that the classical 
theory ‘of statistical inference does not provide direct answers to many problems 
which are of great interest in the applications. One of them, which arises in 
the theory of samples from k populations, is what Mosteller has called “the 
problem of the greatest one.”” The word “‘population”’ is used here for a process, 
m(@) say, which generates independent random variables X,, X2,--- , each X 
having the same distribution function P(X < x) = G(z, 0) say, and a set of X’s 

1This paper is based on a thesis submitted to the Department of Mathematical Stat- 


istics, University of North Carolina, in partial fulfilment of the requirements for the 
Ph. D. degree. This work was sponsored by the Office of Naval Research. 
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which have been generated by 7 is called a sample from the population. We shall 
describe the problem, as also the formulation adopted in the following section, 
in terms of two special cases. These cases occur when the k given populations 
™,™2,°** , ™ are such that 7; is characterized by the distribution function 


G(x, 0:) = i(? am ), 6; = (b;, c:),c; > 0,7 = 1, 2,--- , k, where h(z) isan 
absolutely continuous non-decreasing function with h(— ©) = 0, h(+ 0) = 1. 
Such sets of populations appear frequently in statistical theory and practice, a 


given set of normal, or rectangular, or gamma type populations being familiar 
instances. 





t 


Case 1. Let X;i;,7= 1, 2,---,n be a sample from the population 7; , i = 


1, 2, --- , k where 7; is characterized by the distribution function a2"), 
c 


b; being unknown, and suppose that the statistician is asked to select the popu- 
lation which he thinks has the greatest b;, but is allowed to select more than 
one population if (as a consequence, say, of “insignificant”? outcomes of tests of 
differences between populations) he does not feel confident enough to select only 
one. This situation will occur if, for example, the X;;’s are observed yields in an 
agricultural experiment in which each of k varieties has been replaced n times, 
the yield with variety 7; being normally distributed with unknown mean m, and 
variance o°, and the statistician is asked to recommend one or more varieties 
for general use. (Cf. Example 1 in Section 6.) 

CasE 2. Suppose now that the X;;’s are samples from populations 7; char- 
acterized by distribution functions (=), c; > 0 unknown, 7 = 1, 2,--- , k, 
and the statistician is asked to select the population which he thinks has the 
greatest 1/c; , but is allowed to select more than one population.” This situation 
will occur if, for instance, the 7; are factories producing an article having a numeri- 


cal quality characteristic X, (? =< 





- ) being the distribution function of X in the 
product of 7; , and the statistician is required to assign production to one or 
more factories, the object being to obtain product of stable quality, b being the 
standard characteristic. 

It is clear that the usual statistical theory, which confines itself to estimation 
of parameters 6; and testing of hypotheses of the kind Ho(b; = constant), is 
inadequate to deal with problems of this sort, where a definite course of action is 
required of the statistician. It is hardly necessary to add that selection is an im- 
portant problem in the applications, and the testing of hypotheses is often an 
indirect attempt to justify selection. In accordance with Wald’s formulation of 


2 There is no essential difference between the problem of the greatest one and the problem 
of the least one. In order to avoid trivial complications, the terminology of the former will 
be used wherever possible. 
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. ° e e 2 3 . - 
the problem of statistical inference, we proceed to consider explicitly the purpose 
of selection and the “loss” involved in making any particular selection. 


3. A class of weight functions. Let 7, , 72, --- , m bea given set of popula- 
tions, 7; being characterized by the distribution function G(x, 6;), and let us 
denote any particular selection, say s, by indicator variables 2; , 22 , --- , 2% Where 


z; = 1 if zr; is selected and = 0 otherwise. Since any meaningful selection must 
concern itself with the random variables generated by the populations selected, 
consider the function G(x | s) = >t 2:G(x, 6;) il z; . G(x | s) isa distribu- 
tion function, and provides a logical and direct overall picture of the effect of 
making the selection s, since no distinction is made between the populations 
selected. In immediate generalization, we define a ‘“‘selection” s to be a vector, 
S = (pi, Po, °°: , Pe) With p; = O, > -1 pi = 1, and put G(x ls) = >, 
p:iG(x, 6;). Roughly speaking, G(x | s) is the distribution function which charac- 
terizes the mixed population obtained if sampling rates p; , po, ---: , De are 
assigned to m , m2, °°: , m respectively, p, = O corresponding to rejection of 
a, . Henceforth, a selection vector will be called a decision. 

Now, if each of the G(x, 0;)’s were known, an appropriate decision s could be 
chosen without resort to sampling. If not, the statistician must construct (in 
advance) and use an s-valued function of the sample values. Such a function, 
say d, is called a statistical decision function or decision rule. The decision s 


according to d, say s(d) = (pi(d), pe(d), --- , p(d)), is in general a random 
vector, so that for any fixed x, G(x | s(d)) isa random variable. Consider the 
distribution function H(x | d) = E[G(z | s(d))] = “1 G(x, 6;)E[p,(d)], where 


E denotes the expectation operator. It represents the average overall effect of 
using the decision rule d, and so affords a reasonable description of the perform- 
ance of d. Clearly, the problem is to construct d in such a way that H(z | d) has 
desirable properties. 

The “desirable properties” will depend, of course, on the particular problem 
being considered. Returning to our two cases, denote the arbitrary but given 


’ 


set of all possible parameter points w = (6, , 02, -°-: , 6) by Q, and let D 

be a given class of decision rules d = d({X;;}). Then, in Case 1 we wish to 

choose d* ¢ D such that H(x | d*) = inf H(z | d) for every x and every w € Q. In 
deD 


Case 2, we wish to choose d* so that for every x and every w we have 
H(x | d*) = inf H(x | d) whenever x < b, and = sup H(x | d) whenever x> b. 


deD deD 
These requirements are very strong, and in general no such d* will exist without 
heavy restrictions on 2 and on D. (Cf. however the corollary to Theorem 1. It 
will be found that in a number of cases no restrictions on Q are required provided 
that D is the class defined there.) For some purposes, it may be sufficient to 
consider functionals of H(x | d). The functionals which are most useful in the 


applications are the moments. Thus, one may wish to find d* such that a(d*) = 
+00 


sup a(d), where a(d) = g(x)dH (x | d), g(x) being some appropriate function. 
deD =.) 





3 See, for example, [3], Chapter VI. 
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For example, in Case 1 we may take g(x) = x. Then a(d) is the mean of a random 
variable having H(z | d) for its distribution function, and constructing a suitable 
d to maximize a(d) is “the problem of the greatest mean.” Again, in Case 2 we 
may take g(r) = —(x— b)’, and in that case maximizing a(d) would be “the 
problem of the smallest variance.’ 

In terms of mixtures of distributions, H(x | d) is the mixture of G(z | s) with 
respect to 6, where 6 is the probability measure induced by the decision rule d 
on the class of Borel sets in the space of all possible decisions s. It follows by 


the use of Theorem 5 in [4], or otherwise directly, that maximizing a(d) is equiva- 


k +00 
lent to maximizing the expected value (8) of >> p; | g(x)dG (x, 6;). Writing 
i=l — 00 


+00 
gi: = | g(x)dG(x, @;), one may say that the object is to construct d in such 
a way that the expected value (6) of the “weight function” 


k 
Ww, s) = max {g;} — + DiGi 
7 t=] 


is minimized for every w. W represents the “‘loss” incurred by choosing the de- 
cision s when the true parameter point is w. It will be seen that W defined accord- 
ing to (A) in Section 5 includes essentially all weight functions which are likely 
to be of interest in the type of problem considered in this paper. 

We have so far not emphasized the obvious fact that the probability measure 6 
which is induced by d on the space of decisions will in general depend on the 
unknown parameter point w. Therefore, the expected value (6) of W is to be 
written as E[W(w, s(d)) | w] = r(d | w) say. Following the usual terminology, we 
shall call r(d | w) the risk function of the rule d, and shall say that d* e D is the 
uniformly best rule in the class D if r(d* | w) = inf r(d | w) for all w € Q. 

deD 


4. A class of decision rules. The class of decision rules to which we shall 
confine ourself is rather limited, and may be described as follows, with reference 
to the previous sections: 

(i) Given independent random variables {X;;},7 = 1,2, --- ,n;7 = 1,2,--- ,k 
from the k populations 7; , let 


XxX; = Xa, Xe, aoe Xin), 2 — i, 2, a k and Y = ¥({X,;}), 


where X, , X2,--- , Xx; Y is an independent set, and the X,’s have fre- 
quency functions. The choice of ¢ and y will depend upon particular cases: in 
Case 1, X,,--- , Xx; Y will be statistics relevant to the estimation of 


4 An unpublished theorem of Herbert Robbins insures that if a d* satisfies the strong 
requirements of the preceding paragraph, it will also maximize all functionals a(d) cor- 
responding to such functions g(z). 
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bi , by, +++ , be ; © respectively, and in Case 2 they will be relevant to 
C1, 02, °°, 30. 

(ii) Given the statistics {X;}; Y, D(@; W) is the class of all impartial decision 
rules which are based on them. A decision rule d = d({X;}; Y) is said to be 
impartial if it has the following structure. Let Xq < Xe <--- < Xw 
be the ordered X;’s. Then d defines non-negative random variables \;(Xq) , 
Xe ,°°',X@ ;Y),j= 1,2, --- , & such that ad A; = 1, and ); is the 
proportion p(d) which is assigned by d to the m corresponding to X;;) . We 
use the term “impartial” for such decision rules because they determine the 
proportions [A; , 42, -*- , Ax] without regard to which X belongs to which 
population, and then assign these proportions in strict order of the X,’s. 

We shall specify the intuitively plausible class of impartial decision rules for the 

important normal cases, and give a few instances of such rules. 

Suppose first that the X;,;’s are from normal populations having means m, 
and a common variance o, and that we are interested in the problem of the 
greatest mean. D is then the class of all impartial decision rules which are based 
on the statistics 


X,;=X;= » X i;/n, t= 1, 2, ae » k; 


j=1 


k n 
y = Ff = _ Zz (X;; — Xi) /k (n — 1). 


j=l j=1 








The numerical factors are of no importance, and may be omitted (Cf. footnote 4. 
See also Example 2 in Section 6, where such factors have been omitted for con- 
venience). A rather simple member of D is the rule [A,-.= 1/3, & = 2/3] ice. 
‘Always assign the proportion 2/3 to the population which has the greatest 
X; , and the proportion 1/3 to the population with the second greatest.” In using 
this rule although the \,;’s remain constant from sample to sample, the decision 
s(d) is a random vector. In general, however, the \,’s will themselves be random 
variables. This is the case if, for instance, one insists on utilising the standard 
test of differences between populations, and uses the impartial rule “Perform the 
F test of Ho(m,; = constant) at the five per cent level. If Ho is rejected, assign 
the proportion 1 to the population which has the greatest X;. If not, assign 
equal proportions to all populations for which X; > > *-1 X;/k, and zero propor- 
tions to the rest.’’ Another type of impartial decision rule according to which the 
\;’s are random variables will be described at the end of Example 1 in the next 
section. Now, it is (intuitively) clear that if the sample size n is indefinitely 
large, the rule [A, = 1], i.e., ‘‘Always assign the proportion 1 to the population 




























5 It is unnecessary to specify here the exact relation between the statistics and the 
parameters: (a) the definition of the parameter which determines a distribution function 
G(x, 6) is more or less arbitrary, e.g., instead of writing 6 = (b,c) we may write 9 = 
(b3/c, cosh c), and (b) D(¢1; ¥1) = D(¢2; v2), provided that ¢2 = f(¢1), ¥2 = g(¥1), where 
f(x), g(x) are strictly monotonic functions. It will be seen that Theorem 1 is invariant under 
such transformations of parameters and/or of statistics. 
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with the greatest X;”, cannot be improved, no matter what the true parameter 
values may be. Our main result (Theorem 1) asserts that the statement is in 
fact valid for any n, provided that one restricts oneself to the class of impartial 
decision rules. 

In a similar way, if the X;,’s are from normal populations having a common 
mean m and variances o; , D would be the class of all impartial decision rules 
which are based on the statistics 


X,=S= a (X,,-X)*/n-1, i=1,2,-+-,k; 
y= 
k n 
Y = >> > X;i,/kn, 
i=] j=l 
and analogous remarks will apply to this case. 

It should be observed that in a given case the appropriate statistics {X,}; Y 
may not be as obvious as in the case of populations like the normal which admit 
sufficient statistics for their parameters. This real difficulty is not to be confused 
with the ambiguities mentioned in footnote 4. Furthermore, given the X,’s 
there may not exist Y = y¥({X,;}) which isindependent of the X,’s: we shall then 
assume, without invalidating our result, that the parameter which Y is supposed 


to estimate is known. Theorem 1 becomes operative only after such questions 
have been resolved. 


5. The uniformly best decision rule. It is convenient to define here some 
terms which will be used subsequently without further explanation. All functions 
are assumed to be Borel measurable. Sets will be denoted by curly brackets: thus 
{f = c} is the set on which f = c holds, and {a;} is the set of all a; in question. 
“Measure” will refer to ordinary Lebesgue measure in the zy plane. 

DEFINITION 1. Given k independent random variables X; ,7 = 1, 2,--- ,k 
such that each X has a frequency function, let X:;,7 = 1, 2,---,k, be the 
ordered set, X,;) being the jth X; in ascending order of magnitude. Then A;; = 
{X; = X,,)}, and a;; is the characteristic function of the set A;; , that is, a;; = 1 
for any point of A;; and = 0 elsewhere. 

Since the X,’s have a joint distribution which is absolutely continuous, the 
sets A;; are well defined with probability one. Clearly, we have in Gi; = 1 
for every 7 and>*_; ai; = 1 for every 2, with probability one. 

DEFINITION 2. Let 8B = (b;, bo, --- , by) be a vector of real numbers ); , 
and @ = (fi, fe, °°: , fx) a vector of real-valued functions f;(x) defined for every 
real x. We shall say that ¢ e T(8) if for any r,s = 1,2, --- , k for which b, < b, , 
the set {f-(x)f.(y) < f-(y) fe(x), x < y} is of measure zero. 

We require the following 

Lemma. Suppose that X;, X2,-°-- , Xx ; Y are independent random variables, 
X; having a frequency function f(x) and that @ = (fi, fo, «++, fx) € T(B), where 
B = (b, , bo, +++ , dy) with 


(1) bohks-ss Sh. 
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Then, for any non-negative random variable X = \(Xqay, Xe@,°-: , Xa 3Y) and 
any p,q,m = 1,2,--- ,k withp < q, we have 


(2) > Eta) < ¥ Ea). 


=m i=m 


Proor. Since (2) holds trivially if p = q or if m = 1 suppose that p < q 
( k k 
and m > 2. Writing B(m, j) = i a; = 1\ = > A,;, (2) is equivalent to 
) 


i=m =m 


/ dP > \ dP, and hence to 
B(m,@) B(m, p) 


(3) / \ dP > d dP, 
B(m,q) B’(m, p) B(m, p) B’ (m,q@) 


where B’ denotes the complement of B, and P the probability measure in (2, , 
Ye, °'* » Xe; Y) Space. 

For any permutation 72. --- %% of 123 --- k, define O(iy. --- %) = 
A iyAinw ++: Aux . Clearly, the O’s corresponding to different permutations are 
disjoint and each of the sets B(m, q)B’(m, p) and B(m, p)B’(m, q) is the set- 
theoretic sum of certain O’s. Now, it is easy to see that 


(ty = 1, or2,---, orm — l, and 
O C B(m, q)B’(m, p) if and only if ¢ 
(4) (lg =m, orm+1,-::-, ork. 
4 
(i, =m, orm+1,---, ork, and 


O* C B(m, p)B’'(m, q) if and only if ; 


\tg = 1, or2,---, orm—1. 


Hence a one-one correspondence between subsets O(7; --- 7) of B(m, q)B’(m, p) 
and subsets O* = O(it --- i) of B(m, p)B’(m, q) exists through interchange 
of the pth and qth elements of the defining permutations, the other elements 
remaining the same. It will be sufficient to prove that if O and O* are any pair of 
corresponding subsets, the integral of \ over O is greater than or equal to its 
integral over O*, for then (3) will follow by addition. 

It is clear that for any O, 


/ A dP = Mei, Sigs *** 203 WD 
O(i1%2+++tK) {2ip< Figs + <zi;} 


(5) ) [Tica ars ]are 


= [ A(t, te, ree ti; y) | I yt at, | dF(y), 


where R is the domain {t; << t, < --- < &} and F(y) is the distribution function 
of Y. Let O and O* be any pair of corresponding subsets. It follows from (5) 
that 


[a dP — [. \ dP = [e LIT fie(t] I a dF(y), 










to 


(y) 
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where 


(6) Q = Ah, te, ++ Ke 5 Fig to figlte) — figlto)fig(te))- 


From (4) and (1) we have b;, < b;, . Since p < q implies that t, < t, over R, 
and ¢ « T(8), it follows that the expression in square brackets in (6) is (except 
perhaps for a set of measure zero) non-negative over R. Since d is also non-nega- 
tive, it follows that Q is non-negative over R, and the Lemma is proved. 

We shall now state and prove the main result. Note that the statistic Y is not 
necessarily real-valued. 

THEOREM 1. Suppose that 























(A). Qs a given set of ane w= (0,02, --- , 0%). Bw) = (bi, bo, --- , Dx) 
and y(w) = (g1, 92, °** 5 Je) are deine fer every w such that bp < bg 
implies Jp < gq for every p,q = 1, 2, m3 


Given ans = (pi , Po, °° ae ees > 


t=] 


k 
We, s) = max {g;} — a DiGi. 










(B). X1, X2,--: , Xe; Y are independent random variables, each X; having 
a frequency function f(x, 0:) = fi(x) say, and dw) = (fi, fo, --+ , fe) 


(C). D is the classof all decision rules d such that 
k 
d= d(X a) » X 2 ied » Xk) ;Y)= [A »A2, ore » Ak], Aj = 0, D Aj = 1, and s(d) 
j=1 






k 
= (p,(d), po(d), --- , pe(d)) where p,(d) = 2; aj;,i=1,2,--- ,k. 
.- 


Given d e D, r(d | w) = E[W(w, s(d)) | o]. 
(D). For every w, @ « T(8).° 









Then, for every w, r(d; | w) = sup r(d|w) and r(d, | w) = inf r(d|w), where 
deD deD 
d, = [1,0,0, --- , 0) and d, = (0,0, --- , 0, 1). 
CorROLLarY. Suppose that r;,7 = 1, 2, --- , k are populations characterized 


by distribution functions G(x, 6;) = h (? -~* 





, ¢; > 0. For any fixed z, let 


i 


1 
G(x | @, s) = 2. piG(z, 6;), 
t=1 





and H(x|d,w) = E[G(zx| w, s(d)) | wl). 








Case 1. [f for every w, (i) (= C2 = +++ =&, 
(ii)  € T(8), where B = (bi , be ae » bx), 


then, for every w, 


H(x| dz, w) = inf H(z | d, w). 
deD 










Cask 2. If for every w, (i) bb = be = --- = by = bw), say, 
(ii) od € T (8), where B — (c1 pee ° "sy Ck), 


6 Note that ¢ e T7(—8) is a to ¢*eT (8), where o* = (#7 , ft , --+ . ££), and ft is 
the frequency function of XT = - 
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then, for every w, 


finf H(x|d,w) if x < bw), 


| deD 


|sup H(z\d,e) if 2 > bl). 


\ deD 


H(x| di, 0) = 


Proor. Choose and fix an arbitrary w e 2. Without loss of generality we may 
assume the notation to be so chosen (by simultaneous interchanges of indices 7 
in each of {0;}, {bi}, {gi}, {pi}, {X.}, {fi}, and {a:;}, 7= 1,2, --- ,k) that (1) 
holds. It then follows that g:< go < +--+ < g, and we write 


(7) G=EAthM+ke+-es +h, he 209, i=1,2,---,k. 


Choose and fix an arbitrary member of the class of impartial decision rules, 
say d = [\1, Ax, «** , Ax]. We have 


k 
r(d|w) = max {gi} — ~ gi E(A;a;;). 


i,j=1 


k 


dE (sais) =>) Gtmat--: + AE; ais) 
iij= 


i,j=l 
k k 
fi + Zz. . bP BO, | Rea 
m, j= i=m 
Since Aj; = Aj(Xqa ,Xe@,°°:,Xa@ ; Y) = O, it follows from the Lemma that 
k k 
(10) 2. E(A;ai;) < >> E(\;ax,) for every m and every J, 


=m i=m 


by writing \ = A; , p = j, and q = k in (2). By using (7), (9) and (10) it follows 
that 


> giE(Q;aij) < gt >» PE (A; a, 2) | he 


i, j=l m,j=1 


k k 
=g+ >> >> hn E(ax) 


m=1 i=m 


e 


: gi E (ax). 


= 


Therefore, by (8) and (11), 
k 
(12) r(d|w) > max {g;} — >> g:E(ax) = r(d | &), 
a i=] 


by definition of d;, . The inequality r(d | w) < r(d; ; w) follows from (8) and (9) 
by a similar use of the Lemma. Since both d e D and w e Q are arbitrary, this 


completes the proof of Theorem 1. 





(9) 
his 
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The verification of the corollary is as follows. Choose and fix an arbitrary x 
and write h es) = t,(w). 

Case 1. Let yw) = (l —t ,l —t, +--+ ,1— &). Then r(d|w) = 
H(x | d, w) — min; {t;}, and it follows from the Theorem that H(z\d,,w) = 
supap H(x|d,w) and H(x| dd, ,w) = infan H(x | d, w), for all w. 


((t,, te, +++, te) if b(w) > 2, 
Case 2. Let y(w) = 
(—-t,l—t,---,1-—&) otherwise. 
(max{t;} — H(z | d, w) if bw) > z, 
Then we have r(d|w) = \ 
\H( | d,#) — min {¢;} otherwise, 
inf H(z | d, w) if b@) > 2, 
deD 
so that H(x|d,,) = 
sup H(z | d, w) otherwise, 
deD 


and conversely for H(z | d; , w), for all w. 

The preceding proofs suggest that perhaps (D) is not a necessary condition, 
but the following theorem for the case of two populations shows that it is indis- 
pensable if Theorem 1 is to hold in general. 

THEOREM 2. Suppose that (A), (B), and (C) hold with k = 2 and @,, 62 real- 
valued, that the set 2 of points w = (0, , 2) is denumerable, that B(w) = w, that 
$1 ¥ go for any w, and that Y is a fixed constant. Let u(w) = min; {6;}, vw) = 
max; {6;}, and defining the sets 


R(w) _ {f(a »H) f(te , v) <fli vf >i), ti < te}, 


S(w) — (fi , w) f(t , ») > fla, fle ,p), ti <b} 
in the t, te-plane, put 


R*(t; ’ te) _ i: R(w), 
S*(t, &) = z S(). 


Then a uniformly best decision rule in the class D exists if and only if the set R*S* 
is of measure zero. Subject to existence, the uniformly best rule, say d*, may be 
defined as 


/ 


d* = [1, 0) of (Xw . X 2) € R*, 
[0, 1] otherwise. 


The proof is quite simple, and will not be given. It is clear that under the 
hypotheses of this theorem, the conclusion of Theorem 1 is valid if and only if 
the set R* is of measure zero, that is, if and only if condition (D) holds. 
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6. Examples and discussion. We begin with two applications of Theorem 1. 


EXAMPLE 1. Suppose that grain is to be raised on a given area, say A, of land. 
k varieties, m , m2, °** , m Say, are available, the yields per unit area being 
normally Ketutioaed with unknown means m; and a common variance o’, also 
unknown. A preliminary field experiment (in which n plots of unit area were 
assigned to each variety) has been carried out, and {X;;}, 7 = 1, 2, ---, 7; 
a= 1,2,---,k is the set of independent plot-yields obtained. The statistician 
is asked to suggest how the available land should be divided between the k 
varieties, the object being to make the total expected yield as large as possible.’ 

Suppose that an area Ap; is assigned to 7; ,i = 1,2, --- ,k, with }o*_; p,; = 1. 
Then the expected total yield is yh Apwm; . Our object is to choose the set 
(pi: , P2,°** , Pk) = $ So as to minimize the “‘loss’”’ 


k 
W(w, s) = max |Am,} — 2 Am: pi. 

i i=l 
Since the m;’s are unknown, one must construct an appropriate s-valued func- 
tion of the X,;’s, say d, and set s(d) = d({X;,;}). The expected “loss” in using 
this procedure is given by E[(w, s(d)) | w] = r(d | w), and the problem is to con- 
struct a d which makes r(d | w) as small as possible. (See (A) and (C). Here 
we have set; = (m;,c0),.w = (0: ,02 ,--- ,0), B(w) = (me ,m3, °° ym) and 
y (w) = (Am, Am, +--+ , Am;,)). 


Let X; = >) X,;/n,i = 1,2,---,kand S’ = > > (x — X,)*/k(n — 1). 
j=1 i=l j=l 


Since X,, X., ---,X,; S° is a set of sufficient statistics, it is easy to see 
by taking conditional expectations that corresponding to any decision rule based 
on the X;,;’s, there exists one defined in terms of the X;’s and S’ alone such that 
the risk functions r of the two are identically equal for all possible values 
the unknown parameters. Clearly, one may confine oneself to decision rules 
the type d= s(I X;}; S°). Now, the frequency function of X; is f(x) 
(n/2mo°)'*. exp[—n(x — m,)°/20°], and it is readily seen that m, < m, and 
x < y imply f,(x)fs(y) = fe(x)f-(y). It follows that in the class of all impartial 
procedures which are base ed on |.X;}; S°, the uniformly best procedure is to assign 
the whole area A to the variety with the greatest observed yield. (Note that 
by the corollary to Theorem 1, a much stronger result than the one required 
here holds. Cf. footnote 3.) 

Although Paulson did not set up a weight function in his discussion of the 
selection problem for the present case of samples of equal size from / normal 
populations having unknown means and a common variance, also unknown, he 


7 A double expectation is involved: the expected consequence of a given decision, and 
the expected decision in using a particular decision rule. The ene given is justified 
since it is assumed that the random variables generated by the z’s subsequent to decision 
are independent of the random variables on which decision is based. Cf. Section 3. This 
remark applies to Example 2 also. 








sn SS TOE 
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gave a class {d.} of decision rules and evaluated some probabilities (P(G;) and 
P?. [1], pp. 96-97) which suggest that some of the applications he had in mind 
are similar to the one given here. In our notation, the rule d. is defined as follows 
for any givene > 0, 


k 
d. = [A1, A2, °** 5 Akl, where w= (%; 4), j3=1,2,--- ,k 
j=1 


with a ‘t if X @) we c(S/Vn) < Xw < Xw, 
0 otherwise. 

EXAMPLE 2. Suppose that a manufactured article has a numerical characteristic 
az, and a given article is “defective” if it has an x < a and “acceptable” otherwise, 
where a is some constant. A consumer requires a large number (N) of articles, 
which can be supplied by each one of k manufacturers 7; ,i = 1, 2, --- , k. The 
characteristic (say length) of articles produced by 7; is known to have a rectangu- 
lar distribution with range from b tob + c; , but the c,’s are not known. As a 
preliminary step, the consumer has obtained samples of v articles from each 
manufacturer, and finds the corresponding lengths to be X,,, 7 = 1, 2, ---, »; 
i= 1,2,---,k. The statistician is asked to suggest how the consumer should 
order a total of N articles from the k manufacturers. 

If a < b, the number of defective articles received by the consumer will be 
zero no matter how the order is placed. Suppose therefore that a > b. Then, if 
n; articles are ordered from 7; with od n; = N, the expected number of 


defectives equals N — >>‘_; (n:/N)-g; , where g; = g(c,) and g(t) is given by 


wv (1 -*7°) vt>e-b 


git) = 


0 


otherwise. 








Writing B(w) = (cq, , G2, -+: 5c), Yw) = (91,92, °°* 5 Je), it is clear that the 
expected number of defectives is of the form W(w, s) + h(w), where h (w) is 
independent of s = (m/N, no/N, --- , nx/N), and W is defined as in (A). 

We have now to consider what statistics X,; should be used to construct decision 
rules. Evidently, we are concerned witha “problem of the greatest c; .” 

(a). Assuming v > 1, let X; = max; {X,;} — min; {X,;}. Since the fre- 
quency function of X; is f; (x) = vv — 1)e;"(e; — 2)x”” if 0 < x < ¢; and zero 
elsewhere, it is a simple matter to show that c, < c,, x < y imply f,(x)f.(y) > 
f-(x)f-(y). It follows that in the class of all impartial rules which are based on 
the sample ranges, the uniformly best rule is to order all the N articles from the 
manufacturer with the greatest sample range. 

(b). It may be objected that since the lower end points of all the distributions 
are the same, the use of sample ranges to construct decision rules is not particu- 
larly appropriate. Suppose therefore that one takes the statistics X} = 
max; {X;;} — b. The frequency function of X7 isf?(x) = ve;’x"' for0 <x < e, 
and = 0 elsewhere, and as before, condition (D) holds. Hence the uniformiy 
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best impartial procedure in this class is to order all the N articles from the 
manufacturer who supplied the article with the greatest length in the whole 
sample of kv articles. 

It is important to observe that the uniformly best procedures according to 
(a) and (b) are not identical, and choosing between them is outside the scope of 
Theorem 1. Note also that the statistics X7 are sufficient for the c;’s. Therefore, 
corresponding to any decision rule there exists a decision rule which is defined in 
terms of the X7’s and has the same risk function. In particular, there exists a 
decision rule in class (b) which is equivalent to the uniformly best impartial rule 
in class (a). It would be interesting to know whether this equivalent rule is also 
an impartial one. 

The two examples given above are purely illustrative, and the reader will 
readily construct others in which the statistician is faced with similar problems of 
decision. The second example does not, strictly speaking, belong to Case 2, and 
the reader is urged to consider some specific instances of this Case. There are 
various modifications of “the problem of the greatest one” which may be indi- 
cated here very briefly. These modifications are introduced by placing restrictions 
on the class of possible decisions. For example, in Example 1 the statistician may 
be required to select two or more varieties, and to assign proportions of the land 
to the varieties which he selects in such a way that no variety takes more than 
two-thirds of the available land. In that case, the uniformly best procedure (in 
the class of all impartial procedures which are based on the X,’s and S’) would 
be to assign two-thirds of the land to the variety with the greatest observed mean 
yield, and the remainder to the variety with the next greatest. The proof is a 
slight elaboration of the proof of Theorem 1 and is left to the reader. Again, in 
Example 2 the consumer may wish to obtain all the articles which he requires from 
some one manufacturer. In that case, assuming that an impartial selection rule 
based on the X?’s is to be used, it follows trivially from the case considered 
previously that the uniformly best procedure is to select the manufacturer with 
the greatest X; . This is intuitively obvious, but the obvious requires proof (i.e. 
verification of (D)), as may be seen by turning to Example 3. 

The intuitive notion referred to above is one which is employed quite fre- 
quently in practice. It may be described as follows. Let X, and X2 be independent 
and similar estimates of unknown parameters m, and m, , and suppose that in a 
given instance we have X, > X;:. “Then it is more reasonable to suppose that 
m, > m, than to suppose that m, < m,.”’ Theorem 2 shows that this notion 
is well-founded if and only if condition (D) is satisfied, with 8 = (m, , m2). The 
condition states essentially that “the likelihood of the greater estimate corre- 
sponding to the greater parameter is always > the likelihood of the contrary 
event,” and it should be observed that X,, X2 being ‘good’ estimates (e.g. 
maximum likelihood estimates) does not ensure that this will be the case. The 
following application of Theorem 2 is an illustration of these remarks. 

EXAMPLE 3. Suppose that 7;,7 = 1, 2 are Cauchy-type populations having 
medians m; , and that the set Q of possible points w = (m; , m2) consists of just 
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the two points w, = (1, —1) and w, = (—1, 1). X; and X, are single observations 
from the two populations, and the statistician is required to decide which 
population has the greater median. 

Here it would be reasonable for the statistician to use a decision rule, say d*, 
which minimizes r(d|w) = P(incorrect decision | w, d), where ‘“z; has the 
greater median” and “z, has the greater median” are the two possible decisions. 
That this risk function is included in the scheme described by (A) and (C) may 
be seen as follows. Let the only admissible values of s be (1, 0) and (0, 1), cor- 
responding to the decisions “m, > m2” and “m, < m,” respectively, and setting 
B(w) = (m, , me), define y (w;) = (1, 0), y (w2) = (0, 1). Then for any d such that 
s(d) equals (1, 0) or (0, 1) only, the expected value of W is for either w the 
probability of error in using the rule d. 

Now, if d = d(Xq , X@) = [A , Xe] is any impartial decision rule, it will equal 
either [1, 0] or [0, 1], corresponding to the decisions ‘the population with the 
greater X has the smaller median” and “‘the population with the greater X has 
the greater median” respectively. Since the frequency function of X; is f(x) = 
1/x{l + (2 — m,)J’, a little calculation shows that in the class of impartial 
decision rules a uniformly best one exists, and is given by 


q* = [1,0] if XwXw@ > 2, 
{0,1} otherwise. 


In conclusion, we remind the reader that although the weight function W 
defined according to (A) is general enough to include all problems of the type 
considered in this paper, the sampling scheme as also the class of decision rules 
to which our results apply is very limited. We have (i) assumed that the samples 
from the k populations are all of the same size, and (ii) given no objective criterion 
for choosing appropriate statistics, and no justification for the use of impartial 
decision rules based on these ‘‘appropriate statistics.”’ In view of the applications, 
it would be of interest to extend the general argument of this paper to the 
numerous situations where Theorem 1 does not apply or is otherwise unsuitable. 

The problem of selection was suggested to the author by Professor Hotelling. 
The author would like to acknowledge his indebtedness also to Professor Robbins. 
This paper could not have been written without his constant encouragement and 
helpful suggestions. 
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COMPLETENESS IN THE SEQUENTIAL CASE 


By E. L. LEHMANN AND CHARLES STEIN 


University of California, Berkeley 


1. Summary. Recently, in a series of papers, Girshick, Mosteller, Savage and 
Wolfowitz have considered the uniqueness of unbiased estimates depending only 
on an appropriate sufficient statistic for sequential sampling schemes of binomial 
variables. A complete solution was obtained under the restriction to bounded 
estimates. This work, which has immediate consequences with respect to the 
existence of unbiased estimates with uniformly minimum variance, is extended 
here in two directions. A general necessary condition for uniqueness is found, 
and this is applied to obtain a complete solution of the uniqueness problem when 
the random variables have a Poisson or rectangular distribution. Necessary 
and sufficient conditions are also found in the binomial case without the restric- 
tion to bounded estimates. This permits the statement of a somewhat stronger 
optimum property for the estimates, and is applicable to the estimation of 
unbounded functions of the unknown probability. 


2. Introduction. The notions of completeness and bounded completeness of 
a family of distributions were introduced in [1, 2] in connection with the prob- 
lems of similar regions and unbiased estimation. The question of whether either 
of these two properties pertains to various families of distributions that are of 
interest in statistics was discussed in [2] under the assumption of fixed sample 
size. The only sequential problems of this kind that have been treated in the 
literature (with quite different terminology) refer to the binomial case. For 
this case Girshick, Mosteller and Savage [3] found necessary (and also certain 
sufficient) conditions on the sequential sampling scheme for completeness, while 
Wolfowitz [4] and Savage [5] gave necessary and sufficient conditions for bounded 
completeness. 

If T is a random variable distributed over an additive class of sets in some 
space according to a distribution Pj with @ in some set w, then the family 
9" = \P5| 0} of possible distributions of T is said to be complete if 


(1) | f) dP7() =0, forall bea, 


implies 
(2) f(t) = 0, a.e. DP’, 


that is, for all ¢ except possibly in a set N for which P3(N) = 0 for all 6 € w. 
The family 9’ is said to be boundedly complete if this implication holds under 
the assumption that f is bounded. 
The relation of these concepts to the problem of unbiased estimation is an 
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immediate consequence of a theorem of Blackwell [6]. Let X be a random vari- 
able with distribution P3 , 6 € w, and let T be a sufficient statistic for 6. Denote 
by P¢ the distribution of 7, and suppose that 9’ is complete. Then every func- 
tion g(@) for which there exists an unbiased estimate, that is, a function ¢ such 
that 


Es o(X) = g(8), forall @€w, 


possesses an unbiased estimate with uniformly minimum variance. One can say 
furthermore that if ¢(X) is any unbiased or bounded unbiased estimate of g(@), 
then the optimum estimate guaranteed by the above statements is the condi- 
tional espectation of ¢(X) given T. 

The aim of the present paper is to obtain certain results concerning complete- 
ness in sequential sampling schemes. Some necessary conditions for complete- 
ness are given in section 3, and these are used to obtain necessary and sufficient 
conditions for completeness when the random variable being sampled has a 
Poisson or rectangular distribution. In section 4 it is shown that certain neces- 
sary conditions given in [3] for the binomial case are also sufficient. 


3. A necessary condition for completeness. The sequential sampling schemes 
with which we are concerned are of the following nature. There is given a sequence 
of real valued random variables X; , X2, --- with a joint distribution depending 
on a real parameter 6, which ranges over a set w. We shall assume that for 
each m the set of variables X; , --- , Xm admits a real valued sufficient statistic 
Tm = tm(X1, +++, Xm) for 6, and that for each m the family 9 of distribu- 
tions of 7, is complete. We next suppose that there is given a stopping rule, 
which is such that after m observations have been taken, the decision of whether 
or not to take an m+Ist observation depends only on the value of 
tm(X1,-°**, Xm). It follows (see [6]) that if the total number of observations is n 
(a random variable which may be infinite), then (7, , n) is a sufficient statistic 
for 0. We shall say that the sequential procedure is complete if the family of 
distributions of (7,, n) is complete. Throughout, we shall assume that all 
sequential procedures in question are closed, i.e. that for each @ € w, n is finite 
with probability 1. 

Let Y be a random variable distributed over a Euclidean space according to 
a distribution Ps with @ in w. We shall say that a point y lies in the positive 
sample space of Y if there exists 6 € w such that every open set containing y 
has positive probability for this 6, and that y is an impossible point if it lies in 
the complement of the positive sample space. Consider now a sequential sampling 
scheme as described above. For any integers m < p we shall denote by W’> the 
positive sample space of 7’, given the first m steps of the stopping rule, that is, 
given fori = 1, --- , m the set S; of values of T; for which sampling is discon- 
tinued after the ith observation. Since all the 7’s are real valued, the sets W’, 
are sets of real numbers satisfying the obvious condition W = W’%. The 
union U S,, (S,, is the set of points of Wi’ for which no m+1st observation is 
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taken) will be called the set of stopping or boundary points, the points belonging 
to some W7,’ — S,, are the continuation points. 

We need the following 

Lemma 1. A necessary condition for a sequential procedure of the type described 
above to be complete is that every procedure obtained from the given one by trunca- 
tion be complete.’ 

This is an immediate consequence of the following more general 

LemMMaA 2. Let X,;, X2,--- be as before a sequence of random variables such 
that for each m the set X,,---, Xm admits a real valued sufficient statistic 
Tm = tn(X1,-°+- , Xm). Let X1, 22, +--+ , 2, each be a complete, closed, sequential 
procedure based on these sufficient statistics. Let 5,u D2u --- u =, denote the sequen- 
tial procedure according to which we continue taking observations until at least one 
of the stopping rules >, ,--- , =, tells us to stop. Then the procedure 2,u ---u &, 
is complete. 

This clearly implies Lemma 1. For if one takes for =; any closed, complete 
sequential procedure and for 2 a procedure of fixed sample size, then =, u = 
is the associated truncated procedure. 

Proor oF Lemma 2. It is sufficient to prove the result for the case r = 2. 

Let 1 , m2, n denote the number of observations taken under 2, , 22, 2; u 2 
respectively. Then n = nm, if m S m,n = no if m = n.. Let f be any function 
on Xu X» such that 





E.f(T,,n) =0 forall Oew. 
Then 
E, E[f(Tn,”)| Tn, , mj) = 0! 
Ey E[ f(T, n) | Try , me] = 0) 
Since 2, and =. are complete it follows that 
E(f(T.,”)|Ta =4,m =m] = EUf(T.,)|Tr. = te 
Hence 
0 = P(m S m| Tr, = th,m = n)flr,n) 
+ P(m > m| Ta, =th,m = WElf(Tr. , nz) | Tr, = 4,m =yn,m > nl, 
and the analogous condition holds with the subscripts 1 and 2 interchanged. 


We shall prove that f(T, , n) = 0, a.e., by induction over the possible values 
of n. Suppose, therefore, that for some integer m 


Pon Sm, f(Tr.,n) ¥ 0) = 0. 


(This is certainly true for m = 0.) It then follows that if we take y, = m + 1 
in (3) the second term of the right hand side vanishes, so that 


0 = P(n = m|Tr, = t,m = mt f(t, m+ 1). 


1 The authors would like to thank Mr. E. Fay for pointing out an error in the original 
proof of this Lemma. 


forall dew. 


»>m = 12] = Q, a.e. 


J 


(3) 







L.e, 
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Hence, 


Pon =m = m+ 1,f(Tr,,m) ¥ 0) 
<= Pin =m = m+1, P(r = m|Tr,,m) = 0) = 0. 
Analogously we see that 
Pon = no = m+ 1,f(Tr,,m) #0) = 0 
and, adding, that 


P(n =m+1,f(Tr,n) #0) = 0. 






This completes the induction. 
We need further the notion of strong completeness. Consider a random 

variable W = (U, V), suppose that the distribution of W depends on @, and that 

U is a sufficient statistic for 6. Let P*, be the conditional distribution of V given 

U = u—this is independent of @ since U is a sufficient statistic for 6—and let 

gr* = {91}. We say that the pair 9”, ?”” is strongly complete if the conditions 

(i) Ee f(V) exists for all 8, 

Gi) E(f(V)| U = u) = 0 for almost all u, 

imply 


fv) =0, ae. 9”. 


For brevity, we shall then usually say that {#%,} is strongly complete. 

We can now state the following necessary condition for completeness. 

THEOREM. If a closed sequential procedure of the type considered above is com- 
plete, then 

(i) S» is almost empty for every m for which Wnai — Wma is almost empty, 

(ii) for each m for which S,, is not almost empty, the family of conditional dis- 
tributions of Tm given T m4 = t (ast ranges over Wing; — Wm4:1) is strongly complete. 

Proor. For any t e W 75; — W041 the positive sample space of Tn given T'm41 = t 
is clearly contained in S,, . Suppose first that (ii) is violated and consider the 
sequential procedure obtained from the given one by truncation after m + 1 ob- 
servations. By the lemma it will be enough to show that the truncated procedure 
is not complete. For this purpose let us assume that regardless of the stopping 
rule all m + 1 variables X;, --- , Xm4: are observed. We want to construct an 
estimate of zero based on the sufficient statistic for the truncated procedure. 
This estimate must be a function of 7; for 7; € S: , of T2 for T2 € S2 , etc. That is, 
although we may imagine that the full sample of size m + 1 is taken, we must 
be careful not to use observations that are impossible when the stopping rule 
is followed. 

We shall now show that there exists an unbiased estimate of zero which is 
zero over S;,-°++ , Sm-1, equal to f(T m) on Sm and g(Tm4i1) on Wi4i where f 
and g will be defined below. Since expectation equals expectation of conditional 
expectation, a statistic is an unbiased estimate of zero if its expectation exists 
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and its conditional expectation given 7'n4, = t is zero for almost all t. In our 
case this condition is equivalent to 


(4) [ f(u) dP,(u| Tra = ) + gd  sis.. aP.(u|Tan =) =0 


rm 


for almost allte Wis, 


(5) [$0 aPp(u| Tne =) = 0 
Sm 

for almost all t ¢ Wr41, ie. for almost all te« Wrz; — W4,, since 

t¢ Wri implies P(S,,| Tm = t) = 0, 
together with the existence of Eo(f(T'm) |n = m) and Eo(g(T msi) |n = m + 1). 
Since (ii) does not hold there exists f not vanishing a.e. such that 
Eo(f(Tm)|\n = m) exists and (5) is satisfied. If g is defined by (4), 
Eo(g(T ms) |n = m + 1) exists, and this completes the proof of the necessity 
of (ii). 

The necessity of (i) is now obvious. For if (i) is violated, then (5) is satisfied 
vacuously, and we can take f to be an arbitrary positive valued function (for 
example) and (4) will then be satisfied. 

As immediate consequences of this theorem we shall obtain two conditions, 
which are easier to apply than condition (ii). 

CoROLLARY 1. A necessary condition for completeness is that for no m there 
exists a subset A of Sm such that 


P,(A) > 0 for some 6 


and 
P(A | Tmii = t) = 0 for almost all te Wryi — Wau. 


CoroLuary 2. Suppose that the sequence of X’s is such that in the non-sequential 
case for all m, p with m < p the positive sample space of Tm given T, = t is the 
intersection of the unconditional positive sample space of T » with the interval (0, t}. 
Then a necessary condition for a sequential procedure to be complete is that each 
S,, differ from a half-open interval (possibly empty) [am , bm) with dm S bm, a = 0, 
Omsi = bm, by a set of probability 0. 

Proor. Let r be the first value of m for which this condition is not satisfied. 
Then there exists c > b,_; such that the sets S,n [c, ©) and S,n [b,1 , c) both 
have positive probability. The result now follows from Corollary 1 if one puts 
A = S,n[e, @). 

Next we consider some examples. 

EXAMPLE 1. Let X,, X2,--- be independently normally distributed with 
known variance and unknown mean 6. In this case Tm = >, 7™1X;, and since 
the positive sample space of 74; is the infinite interval regardless of the values 
of T,,--- , Tm it follows from condition (i) of the theorem that no sequential 
procedure is complete, with the trivial exception of the procedures with fixed 
sample size. 
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EXAMPLE 2. Let X,, X2,--- be independently uniformly distributed over 
the interval (0, 6),0 < 6 < «. Then T,, = max (Xi, --- , Xm) and Corollary 2 
gives a necessary condition for completeness. If the procedure is truncated we 
can deduce sufficiency of this condition from (5). However, this proof does not 
apply to the general case. The following proof of sufficiency is similar to some 
of the proofs in [3, 4, 5}. 

Suppose 8; , S., --- form a set of adjoining intervals (some of them possibly 
empty), Sn = [am , bm), and suppose there is a non-zero unbiased estimate of 
zero, ® = $(T,, , n). Let m be the smallest integer for which ¢ is not zero almost 
everywhere on S,, . Then 


Pe) (8) 
E,(@) = Po(n = m)E,(* | n = m) + a Po(n = jE (® | n = 7) = 0, 
j=m+ 


and hence 


(4) 


(6) Po(n = m)Ep(@ | n = m) = — 2 Po(n = j)Eg(@ | n = )j). 
j=m+ 
Now the right hand side of (6) is zero when 6 S b,, , since it is then impossible 
that T; « S; for any 7 > m. Hence 
Elo(T m,m)| am S Tm < bul = 0 forall 0S bn, 


and therefore 


0 
[ o(z, mz” dx =0 forall @in [an, bn). 


But this implies ¢(2, m) = 0 almost everywhere in S» , which is a contradiction. 

EXAMPLE 3. Let Xi, X2,--- be independently distributed according to a 

Poisson distribution with mean @. Then T,, = > 71 X; and again we can apply 

Corollary 2. To prove sufficiency we proceed as in example 2. If the condition of 

Corollary 2 is satisfied we may write without ambiguity ¥(T7T,,) for o(T,, n). 

Let c be the smallest value of 7’, for wa Or ae ~ 0. Then if the probability 
) 


of T, = j is k(j)@’e°”’, the identity E,(@) = O implies 


(4) eo 


o(c)k(c)6p = », o(j)kK(j)¢’ + p(k) Ee” = 2 e(j)k(j)e’e"" . 


Dividing this equation by 6° and letting @ tend to zero we see that the right 
hand side tends to zero, which implies ¢(c) = 0 and hence a contradiction. 


4. The binomial case. As was mentioned in section 1, the problem of bounded 
completeness was solved for the binomial case in [3, 4, 5]. Since presumably one is 
unwilling to estimate the bounded parameter p by means of an unbounded 
estimate, further work here may seem unnecessary. However, the problem of 
completeness seems to be of interest for two reasons. If the procedure is bound- 
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edly complete without being complete then, even though one may be reluctant 
to use such an estimate, there may exist an unbounded unbiased estimate of p, 
which for some values of p has smaller variance than the minimum variance 
bounded estimate. (An example of this is given in [2]). Since this possibility is 
ruled out when the procedure is complete it is seen that completeness permits 
statement of a stronger optimum property. Apart from this one may be interested 
in estimating some unbounded function of p such as 1/p. In this case bounded 
completeness does not permit any statements concerning existence of optimum 
estimates. 

In the present section we shall change our notation somewhat. We are con- 
cerned with a sequence of independent trials with constant probability p of 
success. On the basis of m trials the total number y of successes is a sufficient 
statistic for p. Instead of representing the sufficient statistic for the sequential 
procedure by (y, 2), we shall use the representation (7, y) where x is the total 
number of failures, so that x + y = n. The couples (x, y) may be thought of as 
making up the points with integral-valued coordinates of the first quadrant 
of an xy-plane, and as before may be classified as boundary points, continuation 
points, and impossible points. Adopting the terminology of [3], we shall call 
the value of x + y the index of the point (x, y), so that the points of index m 
lie on the line x + y = m. 

Girshick, Mosteller and Savage defined a sequential procedure to be simple 
if for each m the continuation points of index m form an interval. They proved 
that a necessary and sufficient condition for a bounded procedure to be com- 
plete is that it be simple. (A procedure is said to be bounded if there exists N 
so that the number of observations is SN.) They also showed that in general 
simplicity is not sufficient for completeness. However, it was shown later [4, 5] 
that simplicity is sufficient for bounded completeness. 

A sequential procedure is said to be closed if the probability of termination is 
unity for every p with 0 < p < 1. It was proved by Girshick, Mosteller and 
Savage that a necessary condition for completeness of a closed sequential pro- 
cedure is that no procedure obtained from the given one by removing a boundary 
point be closed. (Removing a boundary point here means converting it into a 
continuation point.) We shall prove below that this condition together with 
simplicity is also sufficient for completeness. An interesting question is whether 
these two conditions are sufficient for completeness for the general sequential 
schemes considered in section 2, when simplicity is replaced by the condition 
that every procedure obtained from the given one by truncation is complete, 
and when the second condition is modified by the appropriate null set qualifica- 
tions. It is easily seen that both of these conditions are necessary. 

The following definitions will be needed below. A boundary point (a, b) is a 
lower (upper) boundary point if for some x < 0 (>0) the point (a + 2, b — 2) 
is a continuation point. An impossible point (a, b) is a lower (upper) impossible 
point if for some x < 0 (>0) the point (a + 2, b — 2) is either a continuation 
point or a boundary point. 
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If the procedure is unbounded every boundary point is either a lower or an 
upper boundary point. If it is simple, no point can be both an upper and a lower 
boundary point. The same remarks apply to impossible points. 

THEOREM. A necessary and sufficient condition for completeness of a closed 
procedure in the binomial case is that 
(i) the procedure is simple, 
and 
(ii) the removal of any boundary point destroys closure. 

Proor. Necessity was proved in [3] as was sufficiency for bounded procedures. 
Sufficiency for unbounded procedures will follow from the following two facts, 
which we shall prove below. 

I. Suppose (i) holds and there exist numbers a, M > 0 such that for all boundary 
points (x, y) of index m 2 M the ratio y/x 2 a. Let f(x, y) be a non-zero un- 
biased estimate of zero defined over the set B of boundary points, and let mo 
be the smallest index for which there are points with f(x, y) # 0. Then f(x, y) = 0 
for all lower boundary points of index mp. 

II. If (i) holds and if for every positive number a there exist infinitely many 
boundary points (x, y) with y/x < a, then one may remove any lower boundary 
point without destroying closure. 

Suppose now that a sequential procedure satisfies (i) and (ii). Then, since no 
lower boundary point can be removed without destroying closure, it follows 
from II. that there exist a and M such that y/x = a for all boundary points of 
index =M. Hence if f(x, y) is an unbiased estimate of zero, and if mp is defined 
as in I., f(x, y) = 0 for all lower boundary points of index mp . Because of sym- 
metry the statements concerning upper boundary points analogous to I. and IT. 
also hold. It then follows analogously that f(z, y) = 0 for all upper boundary 
points of index m). But for a simple unbounded procedure every boundary 
point is either an upper or a lower boundary point, and hence we obtain a con- 
tradiction with the definition of mp . 

Before proving I. and II. we state the following corollary, which generalises 
an example given in [3]. 

CoroLLaRy. A sequential procedure that is not bounded and that has a finite 
non-zero number of lower boundary points is not complete. The analogous result 
holds for upper boundary points. 

Proor oF Coro.uary. This follows easily from II., since if a procedure of 
this type is to be closed there must exist for each a > 0 infinitely many upper 
boundary points (x, y) with y/x S a. 

In the remainder of the paper we are concerned with the proofs of I. and II. 

Proor or I. Assume I. to be false, and let (ao, yo) be the lowest boundary 
point of index mp for which f(xo , yo) # 0. Then y > yo for all other boundary 
points (x, y) for which f(x, y) # 0. Hence if the probability of a point (2, y) 
is c(a, y)p’¢’ and if k(x, y) = e(a, y)f(2, y), 


I:(xo , yo)p°q —Dhk(x, y)p"d’, 
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where the summation extends over all boundary points of index 2m for which 
y > yo . Dividing both sides by p”° we see that 


k(xo , yg? = —prk(z, y)pY *"¢’. 
If we can show that the expression multiplying —p on the right hand side 
remains bounded as p tends to zero, we have a contradiction. For letting p 
tend to zero, we would then see that the right hand side tends to zero and the 
left hand side to k(x , yo), and hence that f(xo , yo) = 0. 

To prove this, note that 


| Zk(x, yp" "g" | SZ | ka, y) | pe”. 


The right hand side is a power series in p. We shall show that this series con- 
verges for some pp > 0. This implies uniform convergence for |p| < po, and 
therefore the series remains bounded at p = 0. By assumption there exist num- 
bers a and M’ such that y/z = a for all boundary points with y > M’. From 
now on we shall consider all series as being summed over the set of boundary 
points for which y > M’ and hence q’ = q”’. Since only a finite number of 
terms are omitted this does not affect any convergence properties. 
Let 0 < p; < 1. Then, since f is an unbiased estimate of zero, the series 


Dh(x, y)pigi 
converges absolutely. Hence, so does 
a ; —yYo- Y—Ya— 
Z| k(x, y) | pre gr Mr? > S| ke, y) | Gm’) = S| ka, y) | po, 


and consequently the last series is convergent. 

Proor or II. Let R be any closed simple procedure satisfying the conditions 
of II., and let (xo , yo) be any lower boundary point of R. We denote by R* the 
procedure obtained from FR by taking (a, yo) to be a continuation point and 
by n* the number of observations for R*. 

We first prove that any upper impossible point of R is also an impossible 
point of R*. The negation of this would imply that one can get from a lower 
boundary point to an upper impossible point going only through impossible 
points. This would require at least one step of either of the following kinds: 

Lower impossible point — upper impossible point; 

Lower boundary point — upper impossible point. 

One can easily convince oneself with the aid of a diagram that any procedure 
under which such steps are permitted cannot be simple. 

Let 0 < p, a < 1, and let a be such that 0 < a < p/q. If pis the true prob- 
ability of success, y/x tends in probability to p/q, and hence there exists N 
such that 


Ptly/x 2a\|p)>- 


whenever the index of (x, y) exceeds N. By assumption there exists Ni > N 
and a boundary point (z,, y:) of R* of index N; such that y;/21 S a. Then the 
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probability exceeds 7 that the random point (x, y) of index N;, will lie above 
(a1 , yi). Since (2 , y:) is a boundary point, the probability is therefore greater 
than 7 that the point (x, y) of index N is either an upper impossible point for 
R and hence impossible for R*, or a stopping or continuation point for R. We 
have therefore proved that the probability is >a that either n* < N, or the 
point (x, y) of index N; is a continuation point of R. 


But given that one has reached a continuation point (a, b) of R, there exists 
N2 such that 


P(n* S N2| p, (a, b)) 2 x. 
For 
P(n* > Nz | (a, b)) = P(n > Nz| (a,b)) -0 as Ne &. 


Since there are only a finite number of continuation points of index N;, it is 
now clear that there exists No such that 


P(n* S No|p) >r+7 —1, 


which can be made arbitrary close to 1 by proper choice of z. Therefore R* 
is closed. 
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SOME ESTIMATES AND TESTS BASED ON THE r SMALLEST VALUES 
IN A SAMPLE 


By Joun E. Watsn! 


The Rand Corporation 


1. Summary. Let us consider a situation where only the r smallest values of 
a sample of size n are available. This paper investigates the case where n is 
large and r is of the form pn + O(+/n). 

Properties of some well known non-parametric point estimates, confidence 
intervals and significance tests for the 100p% point of the population are in- 
vestigated. If the sample is from a normal population, these non-parametric 
estimates and tests have high efficiencies for small values of p (at least 95% 
if p S 1/10). 

The other results of the paper are restricted to the special case of a normal 
population. Asymptotically ‘‘best”’ estimates and tests for the population per- 
centage points are derived for the case in which the population standard devia- 
tion is known. For the case in which the population standard deviation is 
unknown, asymptotically most efficient estimates and tests can be obtained 
for the smaller population percentage points by suitable choice of p and O(+/n). 

The results derived have application in the field of life testing. There the 
variable associated with an item is the time to failure and the r smallest sample 
values can be obtained without the necessity of obtaining the remaining values 


of the sample. By starting with a larger number of units but stopping the experi- 
ment when only a small percentage of the units have “died’’, it is often possible 
(using the results of this paper) to obtain the same amount of “information” 
with a substantial saving in cost and time over that which would be required 
if a smaller number of units were used and the experiment conducted until all 
the units have ‘‘died’’. Jacobson called attention to applications of this type 
in [1]. 


2. Introduction and statement of results. In life testing, information con- 
cerning the smaller population percentage points may be of primary interest. 
The principal aim of this paper is to investigate the properties of some well 
known non-parametric estimates and tests of the smaller population percentage 
points which are based on statistics of the type used for the sign test. These 
non-parametric results are easy to apply and have several other desirable prop- 
erties (see Theorem 1 and its discussion). In particular, if the 100p% point 
is to be investigated, it is only necessary to fail approximately 100p% of the 
number of starting items to obtain the required statistics (n large). Thus, if 
the non-parametric results should also happen to be reasonably efficient, they 


1 The author would like to express his appreciation to Max Halperin for calling atten- 
tion to this problem and for valuable advice and assistance in the preparation of the paper. 
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would appear to be ideal for a life testing situation where a smaller population 
percentage point is to be investigated. 

Examination shows that life tests of the “wear out” type sometimes yield 
empirical distributions which are approximately normal. Also in many cases an 
approximately normal distribution can be obtained by an appropriate monotonic 
change of variable. Thus the case in which the n observations are a sample from 
a normal population will receive special consideration in this paper. 

Investigation of the efficiency of the non-parametric estimates and tests will 
be limited to the situation where the n observations are a sample from a normal 
population. Three cases will be considered: 

(A). Asymptotic efficiency of the non-parametric results as compared with 
the corresponding most efficient results based on the entire sample 
(population variance unknown). 

(B). Asymptotic efficiency of the non-parametric results as compared with 
the corresponding most efficient results based on the pn + O(+/n) 
smallest order statistics for the situation where the variance of the nor- 
mal population is known. 

. Asymptotic efficiency of the non-parametric results as compared with 
the corresponding most efficient results based on the Bn + O(+/n) 
smallest order statistics where 8 is slightly greater than p (population 
variance unknown). 

The definition of “asymptotic” efficiency together with some of its properties 
is given in Section 3. Only asymptotic efficiencies will be considered.” However, 
the efficiencies obtained for the asymptotic case would seem to represent lower 
bounds of the efficiencies for the corresponding non-asymptotic cases since ex- 
perience indicates that the efficiency of non-parametric results usually de- 
creases as the sample size increases. 

First let us consider case (A). From Theorem 3, the asymptotically most ef- 
ficient results for estimating or testing the 100p% population point on the basis 
of the entire sample (population variance unknown) are furnished by the non- 
central t-statistic. An expression for the asymptotic efficiency of the non-para- 
metric results as compared with the corresponding results based on the non- 
central t-statistic is given in the Corollary to Theorem 3. The reciprocal of this 
efficiency represents the factor by which the original number of starting items 
must be multiplied if the non-parametric results are to asymptotically furnish 
the same “information” as the non-central t-statistic applied to the original num- 
ber of starting items. Table 1 contains values of this factor. Although a larger 
number of starting items are used by the “information equivalent” non-para- 
metric results, a noticeably smaller number of items are failed. The factor by 
which the number of items failed is decreased equals the value of p multiplied 
by the factor by which the number of starting items was increased for the ‘‘equiv- 


2Some power function comparisons for the non-asymptotie case were given by Paul 
H. Jacobson in [1]. 
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alent” non-parametric result. Table 2 contains a list of some of the resulting 
factors. 

Next consider case (B). The first step in the analysis for this case consists in 
obtaining the asymptotically most efficient results. These derivations are con- 
tained in Theorems 4 and 5. The Corollary to Theorem 5 contains an expres- 
sion for the asymptotic efficiency of the non-parametric results for case (B). 
The factor by which the original number of starting items must be multiplied 
to obtain “information equivalent’? non-parametric results is obtained in the 
same way as for case (A). Table 1 lists values of this factor. In this case both the 
number of starting items and the number of items failed are slightly increased 
by use of the “equivalent” non-parametric results. The factor by which the 
number of items failed is increased equals the corresponding factor for the in- 
crease in number of starting items. For convenience of reference, however, values 


TABLE 1 


Asymptotic ratio of total numbers of items tested 
(Non-parametric test over most efficient test) 





yl o2 | o5 | ao | . 30 40 


| 


377% 270%! 190% | 150%| 158%| 155% 


101%) 102%| 103%) 


111%| 114% 118% | 129% 140% 148%) 




















of this factor are also given in Table 2. If the variance of the normal population 
were unknown, the asymptotic efficiency of the non-parametric results would be 
at least as great as that obtained for case (B), and likely greater. 

Finally consider case (C). Let p be replaced by 8 in Theorem 5 while the value 
of 6 corresponding to a given value of p is defined by the relation in Theorem 
6. By suitable choices for the values of 8 and O(+/n) in Theorem 5, it is possible 
to obtain asymptotically most efficient results for the population 100p% point 
when the population variance is unknown and only the 6n + O(+/n) smallest 
values of the sample are available. These results are presented in Theorem 6. 
The Corollary to Theorem 6 contains an expression for the asymptotic efficiency 
of the non-parametric results as compared with the corresponding results of 
Theorem 6. The factor by which the number of starting items must be increased 
to obtain “equivalent”? non-parametric results is computed as in cases (A) and 
(B). Table 1 contains values of this factor. The value of 8 represents the fraction 
of starting items which are failed if the estimates and tests of Theorem 6 are 
used. Table 2 contains corresponding values of 6 for certain values of p. The 
factor by which the number of items failed is decreased equals p/8 times the 
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factor by which the number of starting items was increased to obtain the “‘equiv- 
alent”? non-parametric results. Table 2 presents values of this factor. 

The results of Theorem 6 furnish an asymptotically efficient method of esti- 
mating and testing the smaller population percentage points while only failing 
a small percentage of the starting items (for the case of normality). Since a larger 
number of items are failed and much more work is required for computing the 
necessary statistics, however, this method is not necessarily preferable to the 
non-parametric method from the viewpoint of “information” per unit cost. In 
many cases the difference in cost will be slight. Since the non-parametric results 


are valid under much more general conditions, they would seem to be preferable 
for these cases. 


TABLE 2 
Asymptotic ratio of numbers of items failed 
(Non-parametric test over most efficient test) 
0113 | .0234 | 0612 | 130 287 476 | .70 





.05 10 .20 .30 40 


P 





| 
(A) ao eae 3.77% 5.40%| 9.50%| 16.0%) 30.2%) 45. 9% 62. 0% 
| 


101% 102% 103%, 105% 109% 114%; 120% 





99% 98% 96% 94% 90% 88%| 85% 


3. Definition of asymptotic efficiency. In this section the n observations are 
assumed to be a sample from a normal population. Let the 100p% point of the 
population be denoted by 6, . Several classes of results for investigating @, are 
considered in this paper. For example, the non-parametric estimates and tests 
represent one class; the asymptotically most efficient results based on the entire 
sample (population variance unknown) represent another class; etc. The results 
considered consist of point estimates of 6, , confidence intervals for @, , and sig- 
nificance tests for 6, based on these confidence intervals. For a specified class, 
every point estimate and every endpoint of a confidence interval (a one-sided 
confidence interval has only one endpoint) consists of some statistic 7’ whose 
variance is of the form o7/n + 0(1/n) for large n. Here o7 is independent of n 
and has the same value for all statistics T of the class. Also for every such statis- 
tic T the quantity 


Vn(T oa 9y)/or 


has a distribution which is asymptotically normal with unit variance and some 
finite mean A which is independent of the unknown parameters of the normal 
population. By suitable choice of 7, the mean A can be made to have any speci- 
fied value. 
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Now let us define the asymptotic efficiency of the class of non-parametric 
results as compared to a class of results of the type defined by (A), (B) or (C). 
Let the non-parametric results be based on n sample values while the other class 
of results is based on m sample values. Let the common value of o7 for the non- 
parametric results be denoted by o; while the common value of this quantity for 
the other class is denoted by o3 . If oi/n = o3/m when m = nE, then the asymp- 
totic efficiency of the non-parametric results (compared to the specified class 
of results) is defined to be 100H%. For the situations considered in this paper, 
E is independent of n, m and the parameters of the normal population. 

Asymptotic efficiency, as defined in the preceding paragraph, has the property 
that the statistic (or statistics) yielded by a non-parametric result based on n 
sample values has approximately the same distribution as the corresponding 
statistic (or statistics) based on m sample values from the specified class if m = nE 
(n large). For example, consider a non-parametric unbiased estimate 7; of 0, 
based on n sample values and an unbiased estimate 7. of @, from the specified 
class based on m sample values. Then, if m = nBE, the distributions of 


VnlT; — 6,)/o1, Vn(T2 — 6y)/o1 


are asymptotically identical (note that oj/n = o2/m). Similarly for the end- 
points of confidence intervals. Consequently the power functions of significance 
tests based on corresponding confidence intervals are asymptotically identical 
if m = nE. It would therefore appear that the definition chosen for asymptotic 
efficiency is suitable for the situations to which it is applied. 


4. Notation. In this paper ¢(1), --- , t(n) will represent the values of the set 
of all n observations arranged in increasing order of magnitude. Then 


t(1), ia” t(r) 


are the r smallest values of the set of m observations. The notation t(r) has mean- 
ing only if r is an integer such that 1 < r S n. Often, however, expressions of 
the form t[pn + O(+/n)] will be encountered. In what follows, an expression of 
the form t(z) has the interpretation ¢ (largest integer Sz). For example, 


t(4873) = t(487). 


Also the r = pn + O(+/n) smallest observations are frequently referred to; 
here r is interpreted to be the largest integer contained in pn + O(+/n); ete. 


5. Theorems and derivations. First let us consider some well known estimates 
and tests of the population percentage points which are based on statistics of 
the type used for the sign test. These estimates and tests are valid under ex- 
tremely general conditions. It is not necessary that the observations be drawn 
from the same population or even that any two observations come from the 
same population. Population percentage points are not necessarily unique. The 
strongest continuity restriction imposed is that the population cdf be continuous 
at the percentage point considered. These results follow from 
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THEOREM |. Let t(1), --- , t(n) represent the values of n observations arranged 
in increasing order of magnitude. The n observations are statistically independent 
and from populations which satisfy the conditions: 
(I). The populations have at least one 100p% point in common. 
(Il). If the populations have only one common 100p% point, the cdf of each 
population is continuous at that point. 

Let 6, denote the value of the common 100p% point tf it is unique, or the open in- 
terval of common 100p% points otherwise (i.e., the interval of common 100p% points 
with its endpoints deleted). Then asymptotically (n — «) 

(i). t(pn) is a median estimate of 6, . 

(ii). Pr{t{jpn + Kav/np(1 — p)] < 9} = Pri{t{pn + Kav/np(1 — p)] S 4%} 

= a, 
where Ka ts the standardized normal deviate exceeded with probability a. Relations 
(i) and (ii) are approximately satisfied if pn > 5 and p S 3. 

Proor. This theorem is a direct application of the binomial theorem. Condi- 
tions (I) and (II) assure that the equality between the probabilities in (ii) 
holds. Relations (i) and (ii) are obtained by using the normal approximation to 
the binomial theorem; this approximation is reasonably accurate if pn > 5 and 
p< } (see [2]). 

The non-parametric confidence intervals investigated are of the forms 


[pn + Biv/n + o(r/n)] < 4, tipn + Bor/n + 0(-V/n)] > 4, 
tipn + Biv/n + 0(V/n)] < 6p < tlpn + Bo/n + 0(V/n)] (B, < B), 


(these intervals have the same confidence coefficient if < is replaced by S and 
> by 2). The significance tests considered are those obtained from these con- 
fidence intervals while the point estimates of 6, are based on single order statis- 
tics of the form t[pn + Br/n + 0(+/n)]. 

When @, is an open interval, (i) and (ii) need interpretation. The meaning of 
(i) is that the probability of t(pn) exceeding every value of 6, has the value 4 
and that the probability of it being less than all values of @, also has the value 3. 
The inequality t{jpn + Kav/np(1 — p)] S 4, has the interpretaton that every 
value of 6, is greater than or equal to t[pn + Kav/np(1 — p)]. Similarly for 
tpn + Kav/np(1 — p)] < %. 

The purpose in introducing the case where @, is an open interval was to point 
out that situations where population percentage points are not unique cause 
little difficulty if suitably interpreted. 

Non-parametric results of the type considered in Theorem 1 are also available 
when the sample size is not large. For any sample size n, if the conditions of 
Theorem 1 are satisfied, 








n 


Prit(r) < 6,] = Prit(r) < 6,] = 7 Tern al p'(l — p)*”. 


The probability relations in Theorem 1 were obtained by approximating this 
summation for large n. By suitable choice of r, confidence intervals and signif- 
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icance tests with a wide range of satisfactory confidence coefficients and sig- 
nificance levels can usually be obtained for a given value of n. 

The above discussion emphasizes the generality of application of the non- 
parametric estimates and tests. For most practical situations, however, it is 
permissible to assume that the observations are a random sample from a popula- 
tion which has a probability density function that is non-zero over the range of 
definition and differentiable several times. Then asymptotically t(pn) is also a 
mean estimate of 6, (which is now necessarily a single point). Moreover, the 
asymptotic distribution of t[pn + C+/n + 0(+/n)] can be found in terms of PD, 
C, 6, and the value of the probability density function at 6, . These results are 
a consequence of 

THEOREM 2. Let the population from which the n sample values were drawn have 
a pdf f(t) such that f(t) ¥ 0 over its range of definition and f'(t) exists and is con- 
tinuous in some neighborhood of t = 0,. Then the variable 


Vn/p(l — p)f(O) {tpn + Cr/n + o(/n)] — 6} 


has a distribution which approaches the normal distribution with mean 


C/V pl — Dp) 


and unit variance asn — ~&. 

Proor. If pn is replaced by pn + Cr/n + 0(+/n), the method used to prove 
this theorem is completely analogous to the proof presented on pp. 368-69 of [3]. 

Now let us consider the asymptotically most efficient results for estimating 
and testing 6, based on the entire set of observations for the case of a sample 
from a normal population (population variance unknown). 

THEOREM 3. Let the n observations be a sample from a normal population (un- 
known variance o°). Asymptotically the most efficient point estimates, confidence 
intervals and significance tests for 0, using all the observations are those based on 
the non-central t-statistic. The value of o7 (see Section 3) for these results based on 
the non-central t-statistic is o' (1 + K’%,/2). 

Coro.uary. For case (A) the asymptotic efficiency of the non-parametric results 
equals 


100(1 + K%,/2)/2mp(1 — p) exp (K+) %. 


Proor. The maximum likelihood estimate of @, based on all n sample values is 
1 n 2 " n n 2 
(1) =e t(t) — K, > | wo ~ _. a | /(n — 1). 
1 1 noi 


This quantity is equivalent to the non-central t-statistic, as can be seen by 
multiplying and dividing [(1) — 6,] by 


Jz E -1y «| /(m- 0. 


From maximum likelihood theory, (1) is an efficient estimate of @,. Asymp- 
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totically (n — ©) the variance of (1) is of the form 
o(1 + K%,/2)/n + o(1/n), 


and it is easily seen that the variance of an endpoint of a confidence interval for 
§, based on the non-central t-statistic is also of this form. The corollary follows 
from combining Theorem 2 with Theorem 3. 

Next let us investigate the situation where only the r = pn + O(+/n) smallest 
values of a sample of size n from a normal population with mean yz and variance 
o, denoted by N(u, o°), are available. First let us consider the asymptotic dis- 
tribution of . 


> t@) + 2a,(n — Ur) 
» Ltt emmy 
(n — r)(bp + 2a,K,) / o 


la r+ 2a,(n — r) °|/ Vr+ 2a,(n — r)’ 





where 


a, = K,/2~/2x (1 — p) exp (5 Ks) + 1/4x(1 — p)? exp (K;3), 


bp = 1/+/2r(1 — p) exp (; K3). 


This distribution is given by 

TuroreM 4. Let t(1), --- , t(r) be the r = pn + O(-/n) smallest values (ar- 
ranged in increasing order of magnitude) of a sample of size n from N(p, o°). Then 
asymptotically (n — ~) the distribution of (2) is N(O, 1). 
Corotiary. Let r = pn + Cr/n + 0(-/n). Thenasn increases the distribution 
of 


e+ Bie M0) AO, + Ik), / siitaillianaieas 
r + 2a,(n — 7) ev p+ 2ay(1 — p) Vr + 2a,(n — 7) 


approaches the normal distribution with unit variance and mean 


C(bp + 2apK,)/[p + 2a,(1 — p)]}*”. 


Proor. The proof of this theorem is long and will be deferred to section 6 of 
the paper. 

If the value of o is known, the Corollary to Theorem 4 can be used to obtain 
point estimates, confidence intervals and significance tests for any population 
percentage point (including yu). The resulting estimates and tests are asymptot- 
ically most efficient. This follows from 

TuroreM 5. Consider the r = pn + O(+/n) smallest values of a sample of size 
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n from N(p, 0°) where o° is known. Asymptotically (n —> ©) the variance of every 
unbiased estimate of u based on only t(1), --- , t(r) and o° is greater than or equal 
to a quantity of the form 


o/n{p + 2a,(1 — p)] + o(1/n). 


Coro.uary. For case (B) the asymptotic efficiency of the non-parametric results is 


Ba 
s K, ex (-3 43) a 
oo| £xP xy / op ied . exp (—K7>) } |» 


‘ a ee + ee e 
2rp(1 — p) an (1 — p)/ |” 


Proor. The proof of this theorem is similar to the proof presented for The- 
orem 4 and will be given in section 6 following the proof of Theorem 4. 

Let p be replaced by 8 in Theorem 4. Even if o is unknown asymptotically 
most efficient estimates and tests can be obtained for the 100p% point of the 
population if 8 is defined by 
(3) Ky = (1 — B)(bs + 2asKg)/[8 + 2a8(1 — 8)}. 


THEOREM 6. Let p, (0 < p < 4), be given and B defined by (3). Let t(1), --- , t(r) 
be ther = Bn + Cr/n + 0(-Vn) smallest values of a sample of size n from a normal 
population. Then asymptotically 


Px] > t(7) + 2as(n — nace | / tr + 2ag(n — r)] < a} 
1 
1 —C(bg+2agKg) /(8-+2ag(1—8) }3/2 

V/ Qe , 
Coro.uary. For case (C) the asymptotic efficiency of the non-parametric results is 

Kg ex (- : K3) 
exp(=Ks) / {PU P\T 24) | exp (Ks) 

2np(1 — p) Ve 2n(1 — B)/ J“ 

Proor. Theorem 6 is an immediate consequence of relation (3) and the Corol- 


lary to Theorem 4. The Corollary to Theorem 6 follows from Theorem 2 and 
Theorem 6. 





—zx2/2 
eo? dy, 


100 





6. Long proofs. This section contains the long proof of Theorem 4 and the 
related proof of Theorem 5. | 
6.1. Proof of Theorem 4. If t(r) is such that 


ra K,o aia a = t(r) < t= K,o + a 


the ratio of the value of the joint probability density function f of t(1), --- , t(r) 
to the value of the function 


n\(l — p)"” (_1 ) i< k _ 7 
(n=l (Tm ad { a7 X  ¢ 
—(n-—r)a eo ot. K,| — (n — r)b E= S + K,} 


’ 





(4) 





ie 


’) 


al 


x. 


(r) 
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is of the form 1 + o(1). Here (and in the remainder of section 6) a = a, ,b = b,. 
Also, for large n and any positive e, the integral of f over the ranges of the 
(1), --: , (r — 1) and for ¢(r) between » — Kyo — n*** andy — Kyo + a 
differs from unity by a quantity which is of the order o(1), i.e., a quantity which 
0 asn—> &, 


Now consider the moment generating function of (2), i.e., Efe’). In evaluat- 
ing this function of 0, let the range of integration of ¢(r), (i.e., the range after the 
other variables have been integrated out), be subdivided into the five intervals 


—x to uw— Dao, u— Do to p— Kyo — oe. 
ead K,o — a * to > Kyo + , 
u— Kyo+ n” to uw + Do, u+ Do to ~. 
Here D is a positive constant which is independent of n and such that 
n—r r— n—r | 0 | (n a] r) (b + 2ak ) 
(1/D)""(1/p)1/(1. — pl" < exp | - |0| (n= r)b + 2aK,) 
Vr + 2a(n — r) 


for n sufficiently large and 
D>|Kp|+n-%/s, 1 — N(D) = N(-D) < e”/D, 


where 
a I " ¢™* ay 
V29r Le 
First let us consider the interval 1 — K,o — n-“” tow — K,o +n“. Using 
(4) in place of f, completing the square in the exponent, making the change of 
variable 
a(t) = t(i) — 0//r + 2a(n — 1) (j= 1,---,7n), 
integrating 2(1), --- , 2(r — 1) over their ranges and then x(r) over the interval 
“ _ 9/4/r + 2a(n — 7) to 
p — Kyo + 0" — 0/x/r + 2a(n — 1), 


up—K,o—n 


an expression of the form 
(5) exp (6/2) + 0(1) 
is obtained. From the above results, this expression differs from the correspond- 


ing integration of f by a term of order o(1); hence the contribution to the mgf 
for the interval considered is of the form (5). 


Next consider the interval 1» — K,o + n“” to u + Do. After t(1), --- , 
((r — 1) have been integrated out, the integrand becomes 


sioseigalllomeiies: ae ie —-z 80 Yr 
(r — 1)!(n — r)! {3 o 4p + 2a(n — 1) 
0 fal fo ES 


___20a(n — r) t(r) — p . b(n — 1) 
Vr + 2a(n — =f +K,| ° Vr + 2a(n =}. 


o 
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By writing {N[(¢(r) — »)/o — 6/+/r + 2a(n — r)]}"” in the form 
E wal if — 01 + o(1))/Wr + 2a(n — 1) 


o 
-N (@= . *) VIF exp (= ae 
o o 


and maximizing exp {2@a(n — r)[t(r) — u]/orv/r + 2a(n — r)} with respect to 
t(r) in the specified interval, it is seen that the value of (6) is less than an expres- 
sion of the form 


n! exp (C\0V/n) t(r) — pl\"2 Pe) — aT 
(r — 1)!(n — 1)! {a [=#] {1 ~ N [fo] + o(1) 


for n sufficiently large. Differentiation shows that {N[]}" {1 — N[]}" is a 
decreasing function of ¢(r) in the specified interval if n is large enough. Also, if 
t(r) = » — K,o + n“”, for large n the value of 


ie (r—1)n—6/10 
ise 
a 4 
: {1 aa —— [orn _ ~~ 


is less than a constant which is less than unity. Thus the value of (6) is less than 
a quantity of the form 


n! p (1 — p)"" 
(r — 1)!(n — r)! 


which in turn is less than an expression of the form 
CavV/n exp (—C§n") + o(1) 


for n sufficiently large. Thus the integral of (6) over the specified interval is of 
the order o(1). An analogous proof shows that the contribution to the mgf for 
the interval » — Do to np — K,o — n“” is also of order 0(1). 

Finally consider the interval » + Do to «. For large n the integral of (6) 


over this interval is less than an expression of the form 


ee mi Lea 1-H OL A ]} 300 + ot 


u+De 


exp (—Cjn™”) + o(1), 


i.e., the contribution to the mgf for this interval is of the order o(1) since the 
coefficient of the integral is less than an expression of the form C +~/n. The upper 
limit (7) was obtained by replacing 


N {[t(r) — ul/o — 0/Vr + 2a(n — r)} by 1, 


1 [224] w be -§[2=4]} 
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(1/D)"* (1/p)" [1/(1 — p)]" exp [—0(n — r) (b + 2aK,)/Vr + 2a(n — 7)] 


by 1. 

A similar type proof shows that the integral of (6) from — © to un — Do is also 
of the order o(1). 

Thus the mgf of (2) is of the form (5) for large n and Theorem 4 is verified. 

6.2. Proof of Theorem 5. Let us consider a single sample value from the multi- 
variate population consisting of the r smallest order statistics of a sample of size 
nfrom N(u, 0°), where o° is known. Then the variance of every unbiased estimate 
of » based on this sample and the value of o” is greater than or equal to the re- 
ciprocal of 


c - re e wea fdt(1) - z dt(r) 


= - a [Sst fat(l) «+ dtr), 


where f is the joint pdf of the r smallest order statistics of a sample of size n 
from N(u, o”). For proof of this statement see pp. 480-81 of [3]. In the lower part 
of (8) the variables (1), --- , t(@r — 1) can be integrated out leaving an explicit 
function of t(r) to be integrated from — © to ©. To evaluate this integral for 
large n, choose some large but fixed interval » — Do to u + Do as was done in 
the proof of Theorem 4. Using a method similar to that presented on pp. 368- 
69 of [3], the value of the integral for the interval 1 — Do to u» + Do is found 
to be of the form 


(8) 


n[p + 2a(1 — p)]/o° + o(n). 


A procedure analogous to that used in the latter part of section 6.1 shows that 
integration outside this interval yields an expression of order o(n). 
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ON THE RELATIVE EFFICIENCIES OF BAN ESTIMATES! 


By Lro Katz 


Michigan State College 


1. Introduction. J. Neyman [3] defined BAN (best asymptotically normal) 
estimates as those functions of observed relative frequencies which i) are con- 
sistent, ii) are asymptotically normally distributed, iii) are asymptotically ef- 
ficient and iv) possess continuous partial derivatives with respect to each relative 
frequency. He suggested the following two problems; first, to determine the 
class of estimates which possess the above four properties and second, to investi- 
gate this class of estimates to see whether, and under what conditions, the use of 
some of them is preferable to the use of others. Neyman’s paper dealt with the 
first problem directly and with the second obliquely. With respect to the first 
problem, he showed that two types of x’-minimum estimates belong to the 
class of BAN estimates as do, obviously, maximum likelihood (ML) estimates. 
On the second problem, the x’-minimum estimates may be more easily computed 
than the corresponding ML estimates in many cases, the ease of computation 
being especially pronounced for the modified x’ with observed, rather than ex- 
pected, relative frequencies in the denominators. The present paper contains 
some additional information regarding the relative merits of these estimates. 

For simplicity, we shall consider a random variable taking on values 


z = 0,1,2,3,-:- 


with probabilities p(a | 01, 02, --- , 6,) depending on r parameters. In working 
with x’-minimum estimates, it is almost always necessary to truncate the prob- 
ability law, taking 


f(x) = p(x | 01,02, -°-- , 6), z=0,1,--: ,k=—1, and 
(1.1) 


f(k) = » p(x | 01,02, °** 6). 


The ML estimates are asymptotically efficient, i.e., have minimum variance, 
with respect to the probability law, p(z | 6), and the x’ estimates have the same 
property with respect to the truncated p. 1., f(z | 6). This suggests that the op- 
timum variances of the estimates of the parameters of the two in samples of N 
may differ and, further, that the minimum variance of the x’ estimates may de- 
pend essentially upon the choice of k. In the course of some unpublished work by 
Evelyn Fix and others in the Statistical Laboratory at the University of Cali- 
fornia on x’ estimation of the parameters of several different p. 1.’s the same 
anomalous situation occurred repeatedly. When the observed data were fitted 


1 This paper was presented to a joint meeting of the American Mathematical Society 
and the Institute of Mathematical Statistics at Boulder, Colorado on September 1, 1949. 
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by the truncated p. 1. with the estimated parameters, the fit appeared to be 
improved when k was chosen smaller. This suggested that perhaps, contrary to 
intuition, it might be possible to improve the precision of estimation by choos- 
ing k smaller, within certain limits. This paper proves that this notion is false 
and that some other explanation of this phenomenon is needed. 


2. Relative efficiency. Cramér [1] has shown, simultaneously with Rao [6], 
that under mild conditions of regularity, the variance of an unbiased estimate, 
g* = O*(a1,%2,°-: , vy), Of a single parameter, 6, where x, 22, ++ , Zw are 
the observed sample, satisfies the following inequality for fixed NV: 


1 
21 D°(6*) > ——-————=, 


the lower bound being attained only by “efficient” statistics. We may take as 
a measure of the relative precision attainable in the estimation of the parameter 
of the truncated p. 1. (1.1) the ratio of the lower bounds (2.1) of variances of the 


estimates of the parameters of the original p. 1., p(x | @), and of the truncated 
p. 1., f(z|6). We define 


E E log io] 
(2.2) Rel. Eff, = —-__® _1 
E 2 log ze) 


06 
In the case of functions depending on several parameters, p(x | 6: , 02, --- , 9), 
and unbiased estimates, #7 , which are functions of the observed relative fre- 


quencies, with non-singular covariance matrix || L;; ||, Cramér [1] showed that 
the fixed ellipsoid, 


(2.3) NDD bjtit; = r + 2, 


j=l j=l 


where 


oe ? log p(x) @ log - 
7” aa "a “is 
a0; 80; 


lies wholly within the concentration ellipsoid, 


(2.4) 2d dX L*t;t; = r + 2, 

i=1 j= 
where || L'’ || = || L,; ||. The two ellipsoids coincide if and only if the 67 are 
joint efficient estimates of the 6;. Thus, the covariance matrix of a set of joint 
efficient estimates is || N6,; ||". In this case, we may define separately the 
relative efficiency with respect to each of the parameters as in (2.2) or we may 
consider the set of estimates for one function to possess greater concentration 
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than the set for the other function if the fixed ellipsoid (2.3) for the first lies 
wholly within the similar ellipsoid for the second. The latter will be the procedure 
we adopt in section 5. 


3. Estimation of a single parameter. With p(x | 0) and f(x | 0) defined as in 
(1.1), form the difference 


0 log p(x) ,| 2 log f(x) P 
(3.1) o(k) = | te pe) | B Ba | 


The regularity conditions under which the Cramér-Rao inequality (2.1) holds 
involve existence of dp(x)/d0 for all x and absolute convergence of 


> 9p(z) 
z 00 , 


Assuming we have a regular case of estimation in Cramér’s sense so that these 
conditions hold, we may write 


Z 1 opt) | as |e] 
oe o(k) = ool jel a |’ 


and, since df(k)/80 = Dox (dp(x)/d0) by the second of the regularity conditions 
above and f(k) = >-r p(x) by (1.1), 


ga st) = Lew) S| oe BP] -[ PY 


V p(x) 96 


By the Cauchy inequality, the right member of (3.3) is non-negative and, since 
f(k) > 0, it follows that ¢(k) 2 0, with the sign of equality holding only when 
dp(x)/d0 is proportional to p(x) forall x = k. In this event, p(x) = Kee’, where 
Kg is a constant depending on 6. Now, if g(x) is constant, p(x) is a rectangular 
p. l. On the other hand, if g(a) is not constant, there are two cases which must 
be considered, namely: 


a) p(x) = Kee", 
b) p(x) = p(x | 4), 
ak Kee, 


In the first case, Ky = (D Sce’™)™ and is independent of 6, so that we do not 
have a case of estimation at all. In the second case, each p(x) for x 2 ais known 
a priori to within a multiplicative constant depending on @ and, hence, no essen- 
tial information is lost in truncation. Thus, except in these trivial cases, the 
relative efficiency is less than unity. 

It then appears that, in every case of regular estimation, the variance of an 
efficient estimate of the parameter of the p. 1. p(x | @) is less than the corre- 
sponding variance for the truncated p. 1. f(z| @) and that, as an immediate 
consequence, the ML estimate in general is capable of greater precision than 


rT 80 
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the x’-minimum estimate for fixed N. This is the result mentioned in the first 
paragraph of section 1. It should be pointed out that the regularity conditions 
for the Cramér-Rao inequality are stringent enough to give this result. To com- 
plete the argument for estimation of a single parameter, form the function 


3.4) Wk) = p(k) z pla) D ple lo(h) — 6 + 1), 


where ¢(k) is defined by (3.1). Using (3.1) and (1.1), we may write 


on _ 1 fap? 1 — dp(zx) | 
o(k) — (kK + 1) = p(k) | oe | +s | > apt) 


> p(x) 
k+1 
1 = dp(x) |? 
< [> a6 | 
dX p(x) 
Making use of (3.5), straightforward algebraic reduction of (3.4) gives 


, , _ | rh) + _ >> dp(zx) |’ 
(3.6) y(k) = | ow 2d p(x) p(k) 2d 7) | = 0, 


the sign of equality holding again only for the p. 1.’s discussed after (3.3). Since 
the first three factors in the right member of (3.4) are positive, it follows that 
¢(k) is a strictly decreasing function of k. Thus, the variance of an efficient esti- 
mate of the parameter of a truncated p. 1., f(x), depends upon the choice of k 
and decreases in strictly monotone fashion to the variance of the original p. 1., 
p(x), as limit. As a result, the anomalous situation mentioned in the second 
paragraph of section 1 does not arise through irregularity in the behavior of this 
variance. 


4. Poisson and binomial probability laws. The Poisson p. l., p(x |) = eA*/z! 
gives immediately 


dlog p(x) P _ 1 
a p [eine 2 


whence, from (2.1), we obtain the usual result that the variance of the best 
unbiased estimate of \ is A/N. The truncated p.1. has 0 log f(x)/dX = (x/A) — 1 


for x < (k — 1), and (0 log f(k))/aA = p(k — 1)/DU¥ pla). 
Thus, 


2 k—1 i ° 
(42) B[2lef@) - 1S oe) +O - woe - 0] + PE— YT 
dX p(x) 
Writing P(k — 1) for 200” p(x), we obtain finally, 


2 
(4.3) Rel. Eff.poisson(k) = P(k — 1) + (A — k)p(k — 1) + ae a 
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Values of p(k) and 1 — P(k — 1) are given directly in Molina’s Tables [2] for 
integer values of k and AX = .001 (.001) .01 (.01) .3(.1) 15(1) 100, or may be 
obtained indirectly from Pearson’s Tables [4] of the incomplete I'-function. In 
the classical example of a Poisson p. 1. quoted by von Bortkiewicz, relating to 
numbers of deaths due to kicks by horses in Prussian Army Corps, N = 200 
and the average number of deaths per corps-year is .61. Either x’ procedure would 
take k = 2 and \ = .6, approximately. Using these values, we find that Rel. Eff. 
(k = 2|X = .6) = .9508, i.e., the loss in efficiency incurred by using a x’ esti- 
mate rather than a ML estimate is of the order of five per cent. 


’ 


The binomial p. |. is given by p(x | n, 6) = (") (1 — 6)” *,x=0,1,---,n 


where 7 is a known parameter and @ is the parameter to be estimated from a sam- 
ple of N observations. We obtain directly E[(d log p(x))/ae/ = n/(@(1 — 8)). 
Computing a similar quantity for the truncated p. 1. and making use of the nota- 


tions p(x; n) = (") é7(1 — 6)” and P(a; n) = 9-3 p(x; n), we obtain, after 


some reduction, 
Rel. Efff.pinomial (&) = "4 G = IP(k _ 3; = 2) 


+ 2" Pk = 25m = 1) + nPO = 130) 


4 n{P(k —1;n) — P(k — 2;n — oY 


1 — P(k — 1;n) 
The form (4.4) is suitable for computation if tables, such as Pearson’s Tables 
[5], of the incomplete B-function are available covering a range up to the param- 
eter n. If such tables are not available (4.4) is inconvenient since it involves 
probabilities associated with three different binomial laws. In this case we may 
use the relations 


P(a;n) — P(a—1;n — 1) = (1 — @)p(a;n — 1), 
p(a;n) = ™ pa —1;n-—1) and 


(n — a)é 
[oe =— i) = 5 ahs ~ ba = 1 
pla; n ) a(l = 6) pla —1;n ) 
to obtain the alternative form 
Rel. Eff.sinomiai(k) = P(k — 1;n — 1) + (n6 — k)p(k — 1; — 1) 


(4.6) 4 nL = Op = 13n- DP 
1— P(k — 1;n — 1) + Op(k — 1; — 1)’ 
which involves only the one binomial p. 1., p(x | n — 1, @). 

As an example, consider the probability situation in which ten independent 
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trials are made, each with the same probability of success, 6. The number of 
successes in each set of ten trials is one observation. On the basis of N observa- 
tions, we are to estimate 6. We shall investigate the relative efficiencies when 
@ = .10. Taking n = 10 and 6 = .10 in (4.6) we compute the following table of 
relative efficiencies for different choices of k: 


Relative efficiencies of x’ estimates in the case of the binomial p.1.,n = 10, 6 = .10 





k | Rel. Eff. 








It is obvious from the table that the loss in efficiency is not great when k = 3 
and, hence, the variances of the x’ estimates are practically equal to the variance 
of the ML estimate. But, in ordinary practice, N, the number of sets of ten trials 
each, would have to be over 140 before k could be safely chosen as large as 
k = 3, and even k = 2 requires N = 38. Cases in which we seek to estimate 
parameters on the basis of about 100 observations are not rare; in the present 
instance, use of a x’ estimate would produce about 11% greater variance than 
the use of a ML estimate. 

The two elementary examples considered in this section provide only very 
fragmentary evidence of the need for caution in employing x’-minimum esti- 
mates; much numerical work would have to be done to provide any reliable guide 
to the relative efficiency of such estimates. 


5. Estimation of two or more parameters. Consider the p. 1. p(x|6:, 62, 
+++, 6,), 2 = 0, 1, 2, --- , with ellipsoid of concentration for a set of joint effi- 
cient estimates given by (2.3). The truncated p. 1. given by (1.1) has a corre- 
sponding ellipsoid of concentration 





(2.3’) ND Di biytit; = r + 2, 
im] j=l 
with 6;; = E i (2) 2 toe S| We shall show, in this section, that the el- 
i i 


lipsoid (2.3) lies wholly within (2.3’); this is so if the left member of (2.3) is 
uniformly greater than the left member of (2.3’), for every choice of the t; ,7 = 
1,2, --- , r. Accordingly, we form the difference, 
(5.1) Qh) = 2D, uj — Fits by. 

i=] j= 


Adopting the notations, 





pa) =P) ond f(a) = UO, 
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we obtain by direct subtraction, 


pilx)p(x) — filk)fi(k) |, , 
- om = 2 [Dee We 


Equation (5.2) is unchanged if the right member is written in the form 


wit filk)f i(k) 24 - 
— Foy PO + FG F08 Jew, 


If this latter is now written as 


ae pi(x) _ i) (2 7 py , | " 
64) a0) = FE s00[ se LABS ~ fos) Sea ~ Joy} 2 Jee 
it is evident that the expression in square brackets in the right hand member is 


precisely the mean value of the expression in curly brackets taken over the set 
x = k. If we denote by E {g(x)} the expected value of g(x) over the set x = k, 
z>k 


we have 
pi(x) - £@) (ee - fi) : 
6) W=LEM E BABS 7) *\o@ ~ fs) “f 


Finally, since the (finite) sum of the expected values is equal to the expected 
value of the sum, we have, 


(5.6) Q(k) = f(k) E iz a8 -_ i 


p(x) = f(k) 


Since f(k) > 0, Q(k) = 0. We need only note that Q(k) = 0 only if the linear form 
in curly brackets in (5.6) is identically zero, i.e., if each coefficient of t; vanishes. 
This can happen only in the trivial cases analogous to those described in Sec- 
tion 3. 

It has been shown that the ellipsoid of concentration of a set of joint ef- 
ficient estimates of the parameters of a p. 1. lies wholly within the corresponding 
ellipsoid of the truncated p. 1. Therefore, the best procedure for estimating the 
parameters of a truncated p. |. cannot attain the precision of an efficient pro- 
cedure for estimating those of the original p. 1. 

In order to complete the argument for the general case, we form the difference 


” pilk)pi(k) _ filk)f;(k) 
salilliesiedadal > [2o2® Dik) 16) 


1 + WME + Day, 
fk +1) 


Making use of the two relationships f(k) = p(k) + f(k + 1) and fi(k) = 


zak \t=1 


(5.7) 
ee Sets 
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pi(k) + fi(k + 1), we have 


. (k)f(k + 1) { F j ee fk + 2] \ 

8 »b~ e+) - a — << Diemanee Boe , 
6) WH C+D aL si) ~ EF D 
The right member of (5.8) being positive except in the trivial cases, it is clear 
that Q(k) is a strictly monotone function of k. 


6. Conclusions. It has been shown that the efficiency of x?-minimum estimates, 
or any other estimates which involve computation in terms of a truncated p. 1., 
is necessarily less than the efficiency of corresponding ML or other estimates 
based on the original p. 1. and, further, that the efficiency increases with the 
point of truncation. This was established for estimates of a single parameter and, 
also, for joint estimates of several parameters. Examples given indicate that, 
in any case of regular estimation, use of x’-minimum estimates rather than ML 
estimates should be accompanied by an investigation into the loss in efficiency. 

The author is indebted to Professor J. Neyman, who suggested the problem. 
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UNBIASED ESTIMATES WITH MINIMUM VARIANCE 


By CHARLES STEIN 


University of California, Berkeley 


Summary. Subject to certain restrictions, a characterization of unbiased 
estimates with minimum variance is obtained. For two fairly broad classes 
of problems, solutions are given which are more readily applicable. These are 
used to obtain such estimates in some particular cases. The applicability of 
the results to problems of sequential estimation is pointed out. The problem 
of unbiased estimation is not at present of much practical importance, but 
is of some theoretical interest and has been treated by many statisticians. Also, 
the method used in this paper may be applicable to other problems in statistics. 


1. Introduction. Let R be a space of points x, B an additive class of subsets C 
of R and uw a measure over B such that R can be represented as the union of a 
countable collection of elements of B each of which has finite u-measure. Let 2 
be a set called the parameter space and let X be a random variable distributed 
in accordance with the probability density function p(x | @) for some 6 «Q, 
so that for any C e B 


P{X eC|0} = | ple} 6) du(a). 


A measurable real-valued function f(x) on R is called an unbiased estimate of the 
real-valued function g(@) on © if, for every 0 € Q 


(1) EHX) |) = | fl2)p(e| 6) du(x) = 9(. 


The problem considered in this paper is that of finding an unbiased estimate 
f* of g which minimizes the variance at 4 . Since this variance is 


E(g(X) — g(6s)F | 6) 
- = | (9) — g)P pe | 6) du(x) 
= | fea) p(w | 4) du(x) — | [ sep | 6) ane) J, 
this problem is equivalent to minimizing 
(3) [ Weak pe |) due) 
subject to (1). It will be convenient to introduce the measure 


(4) (0) = | pel 6) dula) 
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and the probability ratios 


_ P(x |) 
(5) a(x | 0) pla | 60) 
We suppose z(zx | 0) finite for almost all z, and all 6. When we say “for almost 
all x,’ we mean “‘except for a set of u-measure 0.” 

In most practical problems, the set R is a subset of some finite-dimensional 
Euclidean space and yp is either ordinary Lebesgue measure or, in the case where 
R is countable, counting measure which makes the measure of a set the number 
of points it contains. An exception is the application to sequential analysis 
considered in section 3 below, in which R is a countable union of sets, each of 
which is a subset of a finite dimensional Euclidean space. For the basic notation 
and concepts of the theory of integration see Saks [2], Ch. I. 

We shall define 


(6) A(G, 64) = [ x(e|6)m(e|@) dr(z), 
and suppose 
(7) A(6, 0) < for all @. 


By Schwartz’s inequality this implies that A(6;, 2) < © for all 6,, 6. If (7) 
is not true then it may happen that there exists no unbiased estimate with 
minimum variance even though there exist unbiased estimates. Consider, for 
example, the case where © consists of two point, 0 and 1, and g(@) = 6, and 


(x |0) = lfor0 <z<1 
P 0 otherwise 


oo 
_ jaz “ford <2<1 
wie ; otherwise 


and yu is ordinary Lebesgue measure. It is clear that there exist unbiased estimates 
of @ with arbitrarily small positive variance at 6 = 0 but there exists none with 0 
variance. 


2. The principal theorem. In accordance with the usual terminology we de- 
note by Lz the class of all measurable functions ¢ such that 


(8) [ e@P ata) < @. 
Finally, G is the class of all functions y expressible in the form 
(9) v() = | $(x)(x| 6) dr(x) with oeLo. 


THEOREM 1. Jf r(x | 6) is finite for all 6 and almost all x, and (7) is satisfied, 
and there exists an unbiased estimate of g, then there exists an unbiased estimate 
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f* of g which minimizes (3). If f* has finite variance then any other unbiased esti- 
mate of g with minimum variance at 6 is essentially equal to f*, that is, differs 
from f* only on a set of u-measure 0. A function f is an unbiased estimate of g with 


minimum variance at 0 if and only if there exists a real-valued functional T on G 
for which 


(10) TA(6, 0:) = g(@:) for all @, € Q, 
(11)  [ o(x)n(|0) dv(x) = [ e@ye@ dv(x) for all o € Lx. 
(The preceding sentence does not assume the existence of an unbiased estimate of g.) 


The minimum variance is Tg(@) — [g(60)]’. 
Proor. Let {f;} be a sequence of unbiased estimates of g such that 


lim [U@r a@ = gLb. [ Yor a@ 


where f ranges over all unbiased estimates of g. Then by the weak compactness 


of every sphere in Lz (see [1], p. 10) there exists f* « L, and an increasing sequence 
{n;} of integers for which 


| or oe lim | of d> for all ¢ ¢ In. 


Since x(x | 6) e Le by (7), this implies that f* is an unbiased estimate of g. Also 


(12) | [f*P dv < lim | fi,dv = gl. | f dv. 


Thus f* is an unbiased estimate of g with minimum variance. 
Let ¢; € Le be such that 


(13) [ ¢@x(e|0 d(e) = 0 for all 0 ¢ 2. 
Then, using the f* defined in the last paragraph, we obtain for any real 
a4) o< | Ut t+eo)e — [ita = ae | oif* dy + é | dim 


since f* + ed; is an unbiased estimate of g. Dividing (14) by e and letting e — 0 
we obtain 


(15) for dv = 0. 


If a function in G can be represented in two ways, 


[ e@x@\o m@ = | ¢@xelo He, 
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and consequently ¢; = ¢’ — 9 satisfies (13) and (15). Thus (11) defines a func- 
tional on G m a consistent way. Also, this functional satisfies (10) since 


TA(6, 6,) T | x(x \a)x(e\6) dv(z) 


| x(x | 0,)f*(x) do(x) = g(@). 


By (2) and (11) the minimum variance is 


[ ir@F ae — tgGoF = 7 | s*@)x(e\0) dole) — (9(60F 


T9(6) — [g(6)I. 


To prove the converse, let f* be any function in L, for which there exists a 
functional T satisfying (10) and (11). By (11) with ¢(z) = w(x | 4), 


| f*(x)a(x|0,) do(z) = T | x(x | @)x(x |) dv(z) 


= TA(6, 4) = g(h) 


by (10), so that f* is an unbiased estimate of g. Any other unbiased estimate f 
of g with finite variance at % is an element of L.. Thus from (1) and (11) 
we obtain 


T9(0) = | ff @ 


= | it aw. 
Applying Schwartz’s inequality to the middle expression we obtain 
[uta < [ure 


with strict mequality unless f is essentially equal to {*. 

CoROLLARY 1. Suppose x(x | 6) is finite for all @ and almost all x and (7) holds. 
Let H,(x, d) be the set of all 6 € Q such that x(x | 0) > d, and let H be the smallest 
additive class containing all H,(x, d). Suppose there exists an additive set function d 
over H such that there exists a finite collection of parameter points 6, and positive 
number c, such that 


(16) [ x@\) | dd(6) | < Yee w(x |) 


for almost all x, and 


(17) | A(6, 6) dd) = g(@,). 
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Then the unbiased estimate of g(@) with minimum variance at 6 is 
(18) p(x) = | w(e|0) are 

and the minimum variance is 

(19) [ 9@ an — to@y¥. 


Proor: We need only show that (10) and (11) are satisfied by 


Tv) = | vO aro) 
and (18). But 
(20) TA(0, 0) = / A(0, 6:) dd(9) = g(,) 
by (17) and 


T | o(x)e(x | 0) dv(x) = | dn(6) | (x) x(x | @) d(x) 
(21) 
= [e@ dv(x) [elo dx(6) = [e@r@ dv(z). 


Since each of the functions ¢(x), r(x | 6) considered as a function of x and @ is 
measurable (BH), their product is also. The interchange of order of integration 
in (21) is justified by Fubini’s Theorem (Saks [2], p. 87) and (16) which by (9) 


implies that | | dd(8) | [ o@r@ | 0) dv(z) < o. The equations (20) and (21) 


are equivalent to (10) and (11) respectively. 

CoRoLLaARy 2. Suppose m(x | @) is finite for all 6 and almost all x and (7) holds. 
Suppose also that Q is a set of real numbers and: 

(i) for some m, either a positive integer or + ~, r(x | @) is, for almost all zx, 
differentiable m times with respect to 0 at 0 = 0, 

(ii) for each n < m there exists a finite collection of parameter values @,,;, and 
positive constants Cr. such that 


ee [2a +4) ~ # Ce ()| 


5 1 < Z. Cnc (x | Onn) 
| k 


for all 6 whose absolute value is sufficiently small and almost all x, 
(iii) there exist constants a, such that for all 6, , 


(23) g(0,.) = Do a, = A(8, a) | ‘ 
<=) 06” 9=8 


(iv) there exists a finite collection of parameter values 6, and positive constants 









6 is 
tion 

(9) 
(21) 


olds. 


Ul x, 


and 
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c, such that 





m o” , | 
(24) Zz An k a(x | 0 | l< > c,. w(x 6). 
n=0 | 06” 0=6o | k 
Then the unbiased estimate of g(@) with minimum variance at 4 is 
(25) ria) = Ya] Swe) o 
n=0 oe” 6=69 


The minimum variance is 


2 - E |, 


Proor. We need only show that the functional T defined by 


26) Tf a)x(e|0) dole) = Da [ oe)x(e|o) ante) | 


satisfies (10) and (11) with f* given by (25). Equation (23) yields (11) immedi- 
ately. Also 


m 


T | (x) a(x | 6) dv(x) = 2 An ‘ | (x) x(x | 6) ate) | 


ra, foe) Z rel] aw 


by (9), (i), (22) and Lebesgue’s Theorem on term by term integration (Saks 
[2] p. 29.). Using (24) and Lebesgue’s Theorem, we find that this is equal to 


[ era, 2 a (oi | ats « [ e@rw deta). 


which completes the proof. 

There is an obvious combination of Corollaries 1 and 2 which will not be 
stated explicitly. Also Corollary 2 can be extended to involve differentiation 
with respect to several parameters. It would be of considerable interest to 
obtain a characterization of all possible functionals T in terms of the usual 
operations such as integration and differentiation. Also, the methods used here 
should be applicable, with some modifications, to other problems of minimization 
subject to an infinite set of side conditions. 

CoroLutarRy 3. Suppose that subject to the condition of Theorem 1, for i = 
1, 2, f7 are unbiased estimates of g; with minimum variance at 6). Then fi + fz is 
an unbiased estimate of gi + g2 with minimum variance at 4 . 

This follows immediately from (11) and (12) in Theorem 1. Actually, the 
restriction to problems satisfying the conditions of Theorem 1 is unnecessary, 
but we shall not prove this here. 
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3. Some special cases. We first consider a problem which is of little practical 
interest but serves well as an illustration of Corollary 1. Let X be a single obser- 
vation from a uniform distribution on the interval (@, 6 + 1), ice. 


[lifa<2<o+1 

p(x | 6) = : 

(0 otherwise. 

We suppose @ lies in the interval (—N, N — 1) where N is a given positive 

integer, and take as the distribution for which the variance is to be minimized 
(lit _nwezen 

p(x | %) = |2N 
(0 otherwise. 


This is the same as using the original p.d.f. p(x | 6) with @ a random variable 
taking on the values —N, —N + 1, ---, N — 1 with equal probability. The 
measure yu is of course ordinary Lebesgue measure. Then 


2Nifeo<r<6+1 
(27) w(x | 6) = 
\0 otherwise 


and 

0 if 0, < & — 1 

—-@+1if02—1< A < 

2—-A+1ife <A <hm+1 
(0 ifm +1<h. 

For —N < 6 < N — 1, equation (17) becomes 


(28) sy A(, &) = | 


4 
I @ — + 1) ad) 
(29) ee ee min (N—1, 01+1) 
+ (0, —6+ 1) ddA@) = g(0)/2N 


and (18) becomes . 
(30) f*(z)/2N = A(min[N — 1, z]) — A(max[—N, x — 1)). 


The reader will not be confused by the use of A as a point function here, and 
as a set function in Corollary 1. Using (30) and integration by parts (Saks [2], 
p. 102) we can rewrite (29) as 


6;+1 
(31) [ f*(x) dx = g(), 


which is merely the condition that f* be an unbiased estimate of g. It is clear 
from (31) that g admits an unbiased estimate if and only if it is absolutely 
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continuous. Differentiating (31) we obtain 


(32) f*(6 + 1) — f*(@) = g(8). 
Consequently the general solution of (31) is 


[0]+N 


(33) f*@) = 2 J @- i) +O), 
where 7 is a function of period 1 such that 

1 
(34) [ 1@ a0 =0. 

0 


Here, contrary to the usual convention, [6] denotes the largest integer less than 6. 
The one of (33) which minimizes the variance at 4 is determined by the condition 
that there exist \ satisfying (30). Let y be any number on the half-closed interval 
(—N, —N + 1), and sum (30) forz = y,y + 1--- y + 2N — 1. This yields 
1 28 _ 
(35) sy mb fy + ji) = MN — 1) — M-N). 
2N 7=0 
Carrying out the same computation on (33) we obtain 
1 2N—1 j+N 
36—) ay & XU +i-) + vy =A - 1) - A(-N). 


t=1 


Combining (34) and (35) we find that the proper choice of y is that which gives 


{[z]+N+1 1+¢ 


f*(x) = 2 sy 7 @ - el - N +9) 


2N—2 1+: 

> Ca 
1 2nN—1 

+ ae ~ {9G — N) — g(-N)}. 


1 


t=z2+N 


-1)9@-tl-N+a 


If the limit of (37) as N — = exists, it agrees with Nérlund’s simplest definitions 
of the principal solution of (32) (see Milne-Thompson [3] formula (2) p. 201) 
whenever the latter is applicable. The author has not checked the agreement 
with Nérlund’s more general definitions. 

Next we consider the problem of obtaining an unbiased estimate of g(@) with 
minimum variance at % when X consists of n independent observations, each 
uniformly distributed over the interval (0, 6). Here @ is an unknown positive 
number. The result is independent of the choice of @ . Clearly a necessary and 
sufficient condition for the existence of an unbiased estimate of g is that g be 
absolutely continuous. Corollary 1 can be applied to obtain as the best unbiased 


estimate g(Y) + = g'(Y) where Y = max(X, --- X,). However, this result can 


be obtained much more simply by observing that, given any sufficient statistic Z, 
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there exists an unbiased estimate with minimum variance which is a function 
only of Z. A proof of this is given by Blackwell [4]. But Y is a sufficient statistic, 
and the condition that f*(Y) be an unbiased estimate of g is that 


mL Sa" dy = 9). 


This has as its unique solution that given above. 

A similar situation holds when the X; ,7 = 1 --- ”, are independently normally 
distributed with unknown common mean @ and unit variance. Here Corollary 1 
is not applicable, but Corollary 2 is. The result can again be obtained more 
simply as the unique solution of the integral equation 


1 * —4h(y—6./n)2 
V/2n [si (yy tv" dy = g(6) 
with 
* le 
f*¥(a1,-++ an) = foly), Y= 7, dati. 


It should be observed that the methods of section 2 are applicable also to 


problems of sequential estimation. Let X, , X2, --- be a sequence of real-valued 
random variables such that (X;, --- , X,) have the joint p.d-f. pr(a, «++ , 2 | 6) 
for some unknown 6 ¢ 2. Suppose it has been decided to terminate the procedure 
on the m™ observation if (X;, --- , Xm) € Rm for some given sets R,, in m space, 


and suppose these sets are so chosen that the probability of termination is 1 
for all 6. Then we can define the space R = U,,R,, , the union of the R,,, the 
measure 


u(A) = z. Um(ARmn) 
= £ 
for any set A C R for which the intersections A n R,, are Borel sets, where pu» is 
ordinary m-dimensional Lebesgue measure, and the probability density functions 


p(x | 0) = pm(ti-++ tm| 0) if c= (4 ++: tm eRe. 


The previous results are then applicable. Most of the familiar results in the 
theory of statistical inference can be extended to sequential problems in the 
same way. Of course the interesting and difficult problems of sequential analysis 
are usually concerned chiefly with the appropriate choice of the regions Rn. 


4. Connections with the work of other authors. Many lower bounds for the 
variance of an unbiased estimate were obtained by Bhattacharyya [5], and 
some results were obtained earlier by others whose results are referred to by 
Bhattacharyya. His work has been extended to sequential problems as indicated 
in section 3 above by G. R. Seth in a doctoral dissertation at Columbia Uni- 
versity. This leads to results analogous to, but in some respects more general 
than those of Wolfowitz [6]. Among other papers on sequential estimation, 
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there are the one by Blackwell [4] already referred to, and the one by Girshick, 
Mosteller, and Savage [7]. These deal mainly with problems in which there is a 
unique unbiased estimate based on a sufficient statistic. 

The author is indebted to A. Waid, J. L. Hodges, E. Barankin, and H. Rubin 
for some helpful suggestions and comments, 
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DISTRIBUTION OF MAXIMUM AND MINIMUM FREQUENCIES IN A 
SAMPLE DRAWN FROM A MULTINOMIAL DISTRIBUTION 


By Rosert E. GREENWOOD AND Mark O. GLASGow 
University of Texas 


1. Introduction. In this paper, the expected values 


E _ (m1, M2, °**, n) | 
min 
(1.1) . 
N! max ei ti ” 
= =. eel (mas mas +o 5m) | > wh’ +++ py 
- =n! 


nytnot- ++ np=N 1 !No!- min 


will be studied. The quantities {n;}, 7 = 1, 2,--- , k, are understood to be 
non-negative integers, and the quantities {p,;} are non-negative probabilities, 
~p; = 1. Also, 1 S k. Form (1.1) will be evaluated for the binomial case / = | 
= 2 and for the special trinomial case p, = p. with | = 2,k = 3. 


2. Binomial distribution. The evaluations for the expected values in the 
binomial case can be given explicitly in terms of the incomplete Beta function. 
This function may be defined by the relation 


(2.1) Ln k+) = (") a - aa, 


whence 
hzktin-bh= ¥ (") i — @’¢"”. 
r=k+1 r 


It is seen that 
(2.2) In —k,k +1) + Lk +1,n —k) =1. 


For the binomial case, nm = N — m and p. = 1 — gm, and thus instead of 
(ny , Ne) and (~: , po) one may use (n, N — n) and (p, 1 — p) without any sub- 
scripts and without sacrifice of clarity. This will be done in some instances in what 
follows. The evaluation of 


(23) E = (ns, ns) | »~ i = (n, N — “| 1 — 9" 


a in min 


is slightly different for the two cases N odd and N even. 
For N odd, and for the minimum form, the summation may be written in two 
parts, (a) and (b), 


(a) 


















MAXIMUM AND MINIMUM FREQUENCIES 


in which range min (n, N — n) = n, and 


N 1 
(b) NV +1 


2 





SnsNn, 


in which range min (n, N — n) = N 





n. In the (a) part summation one gets 


(N—1)/2 N tie areaps N - : ral v= 


n=0 n=1 n 





(N—3)/2 . r , 
‘ N- 1 . alien N 1 N-1 
ae z"( Jr pe ar => 


In the (b) part summation one gets 


N N ‘ N-1 N ne l 
4 N a nyt 1 ms Pe 4 
notte (*) ( mp" P) oir ( n ) 


























e 
3, T n N-—-n— + N 1 N —_ 1 
‘ -N(1 — p)p (1 —_ p)* ; = NL — pT ( - ? 5 ). 
Similar algebraic manipulations, supplemented by symmetry, can be used to 
- effect. the evaluations tabulated below. 
" For N odd there result the forms 
Emin (n,, n2)] = Nphi—p @ = . 2 ++) 
. N 1 N-1 
+ NG - pl, ( a v=), 
(2.4) " ' 
E|max (m, ne) } = Npl, (A> wt) 
+1 
+ N(L = Phin» (YS £1), 
For N even there result the forms 
of — . N N 
D- E{min (mn, ne) | — Nph_» 9” + N(L ie PI», i, 2 = | ’ 
at (2.5) NN N N 
E({max (m, n2)| = Nol, ¢. 5 + N(1 — p)h_p : Ls + 1). 
For this simple binomial case, max (m , m2) + min (nm, , m2) = N and linearity 
in the expected value operator used in (2.3) preserves this relation, so that one 
obtains 
wo 





(2.6) 





E{min (m , ne)] + Elmax (m, n2)] = 


Thus (2.6) and (2.2) could have been used in evaluating some of the forms above, 
or can be used as a check on the evaluations.. 
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To compute the variance 


Sle — E(a)l’f@) _ 
N 


it will be convenient to note that for the binomial case 


(2.7) a(x) = E{z"] — {Efzl}’, 


(2.8) _ _ Orin 


where 


(2.9) © oss = FE a (n2, rd | “ \E _ (ms, nd |, 


and where because of the non-negative character of mn; and nz 


E i (m, na) | = EF _ (ni, rd). 


To prove (2.8), note that for this binomial case 


{max(m , m2) — E[max(m , n2)]}” = {min(m, m) — Elmin(m, m)]}’, 


and thus each term for omax has its counterpart for omin When using the first part 
of (2.7) to compute these variances, and hence (2.8) must be true. 
Defining o° as the common value, one gets 


20” = omax + Omin 
(2.10) E{max (nj , n3)] + Elmin (nj, n3)] — {E[max (m, n)]}° 
— {E{min (m , n)]}’. 
The value of the sum 
E{max (ni, n2)] + Elmin (nj, n2)] 


is somewhat easier to obtain than that of either part. For, max (nj, n3) is one 
of the integers (nj , n2) and min (nj, n3) is the other integer. Linearity in the 
expected value form then gives 
om E{max (ni , n3)] + E{min (ni, n2)] = Elm’ + (N — n)’] 

= N’p’ + 2Np(1 — p) + N° — py, 


a relation which is similar to (2.6). 
Likewise one gets 


{E[max (m , n2)]}” + {E[min (nm , n2)]}” 
= {E[max (m, n2)] + E[min (m1, ne)}}" 
—2E[max (m, ”2)|E[min (m, n2)] 


N? — 2E[max (nm, n2)|E{min (n , n)]. 


(2.12) 
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Substituting the results of (2.11) and (2.12) into (2.10), and solving for o one 
gets 


Q 
ll 


E{max (m , n2)|E[min (m , n2)] — N(N — 1)p(1 — p) 
E{max (m , n2)|{N — Ef[max (m , n2)]} — N(N — 1)p(1 — p) 
E{min (nm, m2)|{N — Efmin (m, nz)}} — N(N — 1)p(1 — p). 


(2.13) 


ll 


If one desires, one can make independent evaluations of E{max (nj , n3)] and 
E{min (nj , n>)] and compute the variances from relation (2.9). Such evaluations 
bring into play the incomplete Beta functions at four different sets of values, 
with separate sets for N odd and N even. Relations (2.13) seem preferable to 
this suggested ‘‘strong-arm’’ procedure. A proof of relation (2.8) by this means 
seems to be unduly algebraically complicated. 














3. Normal approximation to the binomial distribution. If numerical values for 
large N are desired (beyond the range of tabulated values of the incomplete 
Beta Function) an approximation based on the normal distribution may be used. 
Let 





t ‘ 
m = Nn + 2, 

(3.1) 

nm = N—m = N(1 — ni) — 2, 

where the subscripts may be dropped when not needed for clarity. 

Then one has 
¢ max . le 

ines | min (a + Np, NU Pp) x 
E . (m, m) | =& | ee —_ 
(3.2) min 00 V2xNp(l — p) 
exp ( ~a ) dx 

ne -@) stiles ada ; 
— 2Np(1 — p) 

To evaluate the minimum approximation, note that there are two ranges 

N 

2 (a) “8 £844 ~ FR 

in which range min (x + Np, N(1 — p) — x) = 2+ Np, 

N 4 

in which range min (x + Np, N(1 — p) — x) = N(1 — p) — «. Defining 

| 





(3.3) A(t) = [ : oe exp (-5) dz, 
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a tabulated function, the integrations may be evaluated as 


Elmin (m1, m)] = NpA(M) + N(1 — p)[l — A(M)] 


— PNAC=D oxy [=8O = 
(3.4) T Sp(1 


E|[max (7, m2)] = V(1 — p)A(M) + Npfl — A(M)) 


2Np(1 — p)..._ | —N(1 
+ ALD oe [Bh 


_ N/2 — Xp 

VNp(l — p)’ 

Note also that (2.6) holds for these approximate evaluations. 

For the variance, approximations (3.4) may be used in relations (2.13). Or, 
alternately, the variances may be computed by “strong-arm”? methods using 
the definition (2.9). In this case, using the averaging defined implicitly by (2.10) 
one gets the evaluation 


> N*A(M)[L — A(M)][1 — 2p]? + Np( — p) 
' a Nod — p) [=H — ay 
a N(1 — 2p){1 — 2A a OE saganttinaiemamaintlipese: 
(3.5) + NQ ~ 2p)ft — 2400] PPC) exp | NO — 2s 
2Np(1 — p) —_ ~ al 
—_——_—_——— exp | —___—_—___ |. 
7 PL 470 — ?) 
It would seem preferable to use relations (2.13) rather than the above, for that 
reason the evaluation of forms (2.9) have not been included here. 


M 


° 


4. Trinomial distributions. The form 


, | max N! max ae 
(4.1) E » (ra , vo | —— » ; | (na , vo) | Pi Po p3° 


nitnotn3z=N My!Ne!N3! | min 


may be approximated, for large NV, by the bivariate normal distribution. Sup- 
pose two attributes P (and not P = P) and R (and not R = R) are being ob- 
served in a distribution. Then the four possible outcomes of an experiment 
could be represented as the categories PR, PR, PR, PR with respective probabil- 
ities a, b,c,d;a + b6+c+ d= 1. Insuch a situation, for large NV, one may use 
a bivariate normal distribution as a limiting form of the above described bi- 
variate binomial distribution, or multinomial distribution with four categories. 
If the probability of one category, say PR, is zero, the bivariate normal 
distribution can be regarded as a limiting form of a trinomial distribution. 
Indeed, defining 


(4.2) oe ee 


INp,(l— pdt? sIN, — pl?’ 
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the bivariate normal distribution takes the form [1] 


i a aE ln 2)\ 
(43) dF = onl — ryt XP " a0 — PF (zi — 2rxx. + 2) dx; die, 


where 


_--[~—22—}. 
(1 — px)(1 — pro) 


The expected values are then given approximately by 


(4.4) E acim Sn, vo | . [ ; [ ; bee (ma, ra | dP. 


For the special case p; = p2, evaluations have been made of E[nin ( mi ,ne)] 


by the authors. For the finite summation (4.1), powers of N less than the one- 
half power were neglected, and the values 


F 
E{min (m, m)] = Np — (2?) ; 
Tv 
(4.5) cil 
, Np 
E{max (m, m)}) = Np + (— 
T 
were obtained. 
For the integral case, again for p; = pe = p and hence for r = —p/(1 — p), 
the evaluation proceeds as follows. In virtue of (4.2) and (4.3) 


Elmin (m, n2)] = Np + [Np(1 — p)}? [ [ [min (2 , x2)] dF 
(4.6) iyi 


= Np + Np -pP ff tmin @— 2, oar. 
It is convenient to introduce a rotation of axes in order to evaluate integral 
(4.6). Indeed, rotation through 7/4 radians will give 


Yo 


i) ? 


Yo 


V2 


2 2 2 1 2/1 — 2 
na ® “«p — « “4 <p 
1 + pre # (4) +82), 


min (4%; — 22,0) = min (—yvV 2, 0), 
- a(x ’ X2) = 
A(y1, Ye) 


~~ A 


? 


J 
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Thus integral (4.6) becomes 


E{min (1 , n2)] 


«ee a aT ef c 


1(-p)'( vi 2(1 — 2p)\]_. . 
(4.11) ad | - 21—2p \l —p + ye “t=, min (—yvV/2, 0) dy: dy, 


. Fi: . 
- Np + [SP 8 | 


ro) 


df - nex | - pG=@ gp — GP st] ant an. 








2p 2 


As indicated above, it is convenient to consider the form as an iterated integral, 
and integrate first with respect to y, . The evaluation of (4.11) presents no seri- 
ous difficulties, 


_ x, _ | No(l — p) 
El{min (7m, n2)] = Np | See 2(1 — 2p) ‘| =F — p) [ 
(4.12) —_ | - a i ” 


2(1 — 2p) 
4 
= Np — (*?) 7 
T 


E{max (m, n2)] = Np + (*2). 


Note that these values are the same as those obtained from the finite summation 
form (4.1), as given by (4.5). 
To evaluate the variance 


(4.13) =F a (n3 , n’) | - Np rs (“ yy 


a finite summation form similar to (4.1) or an integral form similar to (4.4) may 
be used. 


In case the integral form is used, it is convenient to introduce the variables 
x, and x2 as defined by (4.2). One then gets 


E{min (nj, nz)] = N*p* + Np(1 — p) 


T } T 4 
-E| min( 2 - 2| Np | ty 53 + 2 | | n)| 
l-—p l-—p 


= N*p’ + Np(l — p) + Np(l — p) 


AT. } 
-E | min (21 -e+3 EA (x1 — 22); o)|, 


Likewise 





(4.14) 








\- 


i 


ay 


les 


MAXIMUM AND MINIMUM FREQUENCIES 423 
in which one integratior over the whole space has been carried out. Rotating 
axes as per (4.7) one gets 


Emin (ni, n3)] = N*p? + Np(1 — p) + 2Np(1 — p) 


, 4 
(4.15) -B| min (- yiy2 — | PAP Y2 ,0) | . 


In evaluating this last expected value form, the region of integration may be con- 
sidered as a sum of separate regions. Over some regions the integrand is zero, 
in other regions the non-negative product 


vo LD 


is the integrand and this condition gives 
y2 = 0, (ye < 0, 
2Np and -| 2a 2Np |. 
” -| = ~ | | i-» 


as the regions of integration with the non-negative product as integrand. 

Since the assumption that N is large has already oe made, it is convenient 
to approximate further here and assume [2Np/(1 — Po is large, and in particular 
to assume that integration from —[2Np/(1 — p)| to + © is equal to integra- 
tion from — © to + for the integrand under consideration and for iterated 
integration with respect to the variable y; . 

Remark: An equivalent assumption is needed in the finite summation case 
when approximating (Np)! by the use of Stirling’s formula. 

Thus one gets (since one of the above regions of integration is to be neglected) 


#[min{ n(n +[ P21) 0} 
Bs CU oo+L2D 
ai — api lelh Mt Limp 
] (1 — p) (1 — p) i | 
(4.16) exp ee at - > 42 dy2 | dy, 
sano ool) Lat=Bel 
Qn(l — 2p) Le (u . Fez =i) | Si- I 
a ae (22). 
~— Ll—p\ar/l 
Collecting results from (4.13), (4.15) and (4.16) one obtains 
(4.17) Omin = Np (1 —p- ‘). 
TT 
By a similar procedure, one may compute also that 


(4.18) Cmax = Np (1 = ty. 


IV 





“ 
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For this three category case, the proof used to obtain relation (2.8) is no longer 
applicable, yet the relation onmin = max Still holds for the approximating rela- 
tions given above. 


5. Conclusion. Since the normal distribution was used in some instances to 
obtain approximations for the binomial and multinomial distributions, many 
of the maximum and minimum relations stated as approximations for the multi- 
nomial are exact for the appropriate normal distribution. 

No convenient formulation was found for the general trinomial case (p, , 
ps, Ps unequal) similar to relations (4.5), (4.17), and (4.18). 

As possible applications of the general solution of this problem, the referee 
has kindly supplied the authors with a reference of Guttman [2]. Sampling 
theory provided by the general solution to this problem could be used in connec- 
tion with Guttman’s reliability coefficient. 
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1. Summary. Given a chance vector X with distribution function F(X, 6,7), 
where 67 denotes the true unknown parameter vector, a broad class of estimates 
of @r is derived which is shown to be identical with the class of all consistent 
estimates of 67 . A sub-class is obtained each member of which has the following 
properties: a.) Its construction depends upon the solution of an equation in- 
volving a single vector function of the parameter vector 6 and the members of 
a sequence {X,} of independent and identically distributed chance vectors; 
b.) the estimate so obtained converges almost certainly to 67 ; c.) it is a symmet- 
ric function of the members of the sequence }{ X,}. In order to obtain this sub- 
class it is postulated that a function of X and 6 exists (continuous in 6 for a 
certain neighborhood of the true parameter 6,7 and existing for each X in a sub- 
set of the sample space) which satisfies a Lipschitz condition in 6. In particular 
if a density function f(X, 07) exists satisfying certain conditions, the consistency 
of the maximum likelihood estimate can be established under regularity condi- 
tions quite different from those usually assumed [1]. This is not to be interpreted 
as a weakening of the usual regularity conditions but rather as an extension of 
the class of consistent likelihood estimates obtained under the usual regularity 
conditions. 





2. Introduction. The present work is the result of investigations into the 
following question posed by J. Neyman: What happens to the asymptotic 
properties of the maximum likelihood estimate of 6, when the usual regularity 
conditions on F(X, 6) are relaxed? The consistency and efficiency of the esti- 
mate are the properties in question, and the present work arose from the ob- 
servation that consistency at least can be obtained under conditions much dif- 
ferent than those usually assumed [1]. The assumptions made below are exis- 
tential in nature, and no general methods are given for the actual construction 
of consistent estimates. As stated above, however, the results of this work can 
be used to widen the class of consistent maximum likelihood estimates established 
heretofore. Although simple upper and lower bounds for the variance of a con- 
sistent estimate are obtained, no answer is given to the question of determining 
the efficiency of such an estimate. In regard to consistent estimates, J. Neyman 
and E. Scott have discussed recently [2] the need for a systematic method of 
obtaining consistent estimates. Wald has given necessary and sufficient condi- 
tions [3] for the existence of a uniformly consistent estimate of an unknown pa- 
rameter @ when there exists a density function continuous jointly in all of its 
arguments, and it is assumed that the domain of each of the unknown parameters 
is a closed and bounded set. It is hoped that the class of consistent estimates 
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derived below will help shed some light on a general method for actually ob- 
taining such estimates. In this connection it is important to point out that if 
necessary and sufficient conditions were known for the existence and uniqueness 
of a fixed point for a transformation on E, to E,, , the weakest possible conditions 
could be expressed for the existence of consistent estimates obtained in the 
manner given below. It is surmised that the use of a Hélder condition of order 
one as presented below is stronger than required. 

Let {X;},7 = 1,2, --- ,m, ---, be a sequence of chance vectors in which 
X; possesses the probability distribution function F;(X, 6) depending upon an 
unknown parameter vector 6. The vector X has components X; ,7 = 1,2, --: , 8, 
where X; is a chance variable, and 6 has components 6; , 7 = 1, 2, --- , m. The 
problem is to obtain a function of the X; which is a consistent estimate of 6. 
We denote by E, the real Euclidean space of s dimensions and by E;, a subset 
of E, excluding at most a set of probability measure zero. For convenience we 
use the symbol || @ || to denote the norm of 6, where 


I] @ || = i+ + --- + 6,)"”. 


We define in a similar manner the norm of any function which assumes values 
in E,, . The following assumption is made: 

ASSUMPTION 1. There exists a point 0 and a neighborhood W( , a) of % having 
radius a (a > QO) which contains the true parameter vector 7 as an interior point 
and there exists an infinite sequence of functions G,(Xi, Xo, --: , Xn; 9), n = 
1,2, --- , ad inf. on E, X E, to E», such that 

(a) for each n the equation 


G(X, Xo, --: , Xn; 6) = 0 


has a unique solution 6 = 6.(X%:1, Xo, °°: , Xn) in W(O, a). (For the sake of 
brevity we usually write G,(X; ®) = G,( Xi, Xe, --- , Kn; 6).) 
(b) For every pair of values of 8; , ® in W(® ,a) and for some Kwith0d< K <1 


lim P{\|G,(X, 0) — G,(X, @) — (@ — @) || SK |\@ — ||} = 1. 


(c) For every « > 0, 
lim P{\|G,(X, ®r) || < e} = 1. 


n> 


3. A consistent estimate of 6; . 
THEOREM 3.1. The solution © = 02(X1, Xo, --: , Xn) of the equation 


Gn(&, Xe, --- , Xn; 6) = 0 


is a consistent estimate of 97, providing G,(X; 6) satisfies Assumption 1. 
Proor: From Assumption 1b it follows that given 6 > 0, we have for all 
n> N’(), 


(3.1) P{\| G(X, ®r) — (Or — 62) || S K || Or — 62 ||} > 1 —, 
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since G,,(X, 62) = 0. It follows from (3.1) that for all n > N’(6), 


|| 1] | 
(3.2) p {ll G0 00 < || Or — 0% || s Goth 1-8. 


l+K =~ —- 1-K 


From Assumption Ic it follows that there exists N’’(e, 5) such that n > N’’(e, 8) 
implies 


(3.3) P{\| G(X, 67) || <e(1 -—K)} >1— >: 


(3.2), (3.3), and a familiar formula in probability imply for all 
n > max [N’(6), N’’(e, 5)], 
P{\| @r — 07 || <e} >1—6. 

It is noted that (3.2) characterizes the speed of convergence of the estimate 
6, . The following uniqueness property is noted: If a given sequence of functions 
G(X, Xe, --: , Xn; 6) satisfies Assumption 1, then Or is the unique parameter 
vector in W(® , a) which satisfies item c of Assumption 1. The proof of this remark 
is left to the reader. 

The following remark demonstrates the extreme generality of the class of 


consistent estimates obtained in the above manner: The set of estimates of the 
parameter vector ®r obtained from the class of all sequences of functions 


Gn(Xi, Xe, °°: » Xn ; 6) 


satisfying Assumption 1 is identical with the set of all consistent estimates of the 
parameter vector 87 . The proof of this remark is quite obvious and is left to the 
reader. 


4. Properties of a sub-class of consistent estimates. The question arises 
naturally concerning a general method for the construction of a sequence of 
functions G,(X:, X:,--- , X,; 6) satisfying Assumption 1. The author knows 
of no general method. It is possible to describe a sub-class of the class of con- 
sistent estimates, the construction of which depends upon the existence of one 
function rather than a sequence of functions. This is possible by application 
of the strong law of large numbers, and in this way consistent estimates of the 
parameter vector are obtained which converge almost certainly to the true 
value 67 . Moreover it is clear that under certain conditions the function 


G(X, Xe, --- , Xn; 67) 
defined as in equation 4.1 below is an asymptotically m-variate normal variable 
AssuMPTION 2. Let {X;},7 = 1,2, --- ,n, --- , be a sequence of independently 


and identically distributed chance vectors with common distribution function F(X; 6), 
where ® is again the unknown parameter vector. 
ASSUMPTION 3. There exists a function g(X, 6) on E, KX Em to Em such that 
(a) for every X € E, and every distinct pair (0, , ) in W(@ , a), 


\| g(X, 0) — g(X, @) — (0, — &) || S K || 0 — & |i, 
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where0 < K < land |, g(X, %) || < (1 — K)a. 
(b) Eg(X, 62) = [ @(X, 6x) dF (X, 67) = 0. 


—™ 


We define the function G,( X, 6) as follows: 


/ 1 n 
(4.1) G(X, 0) = — 2, (Ki, 0). 


The following lemmas are required: 

Lemma 4.1. G,(X, @) as defined in (4.1) satisfies the conditions in Assumption 
3 with G,(X, 6) replacing g(X, 6). 

The proof is sufficiently obvious to be omitted. 

Lemma 4.2. G,(X, 67) — 0 almost certainly asn — ~, if Assumptions 2 and 
3b hold. 

Proor: Since Eg(X; , 67) = ) = 1, 2,---,mn, and the chance variables 
g(X; , 97) are independently and identically distributed, this follows immediately 
from a theorem due to Kolmogorov [5]. 

THEOREM 4.1. If Assumptions 2 and 3 hold, then the equation G,(X, 6) = 
has a unique solution ® = 6, (X1 , mo, °'* , Kn) in W(O, a), where 6* is a con- 
sistent estimate of @7 and is moreover a symmetric function of the observation vec- 
tors X:, Xo, --: , Xn. 

Proor: We obtain the solution 6” by the method of successive substitutions. 
Detine 


6; = 0 —_ G,(X, 0), ne a Oo41 = 6, = G,(X, 6,). 


In view of Lemma 4.1 we can apply a well known existence theorem [4] in the 
theory of functions to prove that the sequence {@,} converges to a limit @% which 
is also in W(6) , a). The same theorem establishes the uniqueness of the solution 
in W(6 , a). This uniqueness property together with lemmas 4.1 and 4.2 estab- 
lish the fact that the sequence {G,(X, 6)} as defined in equation (4.1) satisfies 
Assumption 1. It follows immediately from Theorem 3.1 that @% is a consistent 
estimate of 6; . We can, however, prove a stronger relationship. 

TuroreM 4.2. The estimate @: defined in Theorem 4.1 converges almost certainly 
lo 6r. 

Proor: From Lemma 4.2 we know that given any number e > 0, there exists 
an integer N(e) such that for alln > N(e) 


P}!| G(X, 67) || < «(1 — K)} = 1. 
From Assumption 3a and Lemma 4.1 we see that 
G(X, Or) — (@r — 65) || < K || Or — OF |) 
since G,,(X, 0.) = 0. Then 


G(X, @s) || = G — K) || @e — 6 ||. 
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Clearly the set of X « E, for which || 67 — 0% || < « includes the set of X for 
which || G,(X, 67) || < «(1 — K). 
Therefore, for n > N(e), 


P{\| Or — 0% || < e} = P{|| G,(X, Or) || < (1 — K)} = 1, 


and the proof is completed. 

The uniqueness of the parameter value 67 in the neighborhood W(@, a) 
follows immediately from the remark succeeding Theorem 3.1 since Assumption 
1 is valid in Theorems 4.1 and 4.2. 

It is interesting to note that the application of a theorem in the theory 
of functions of a real variable gives the result that if the function g(X, 6) is 
continuous on a bounded and closed set in E, X E,, and if we take for E, a 
bounded and closed set, then @%(X:, X2,--- , Xn) is a continuous function of 
X,, X2,---, X, for X; « E, (¢ = 1, 2, --- ,n). If we assume the continuity of 
g(X, 6) in X for each 6 in W(@ , a) the following remark demonstrates an inter- 
esting relationship concerning the uniqueness of the solution for @ in the equa- 
tion Eg(X, @) = 0: If in addition to Assumption 3 we assume that g(X, 6) is 
continuous in X for every X in E, and every © in W(@, a) and if at least one 
of the components g;(X, 6), 1 S 7 S m of the m-dimensional vector function 
g(X, 6) satisfies also a Lipschitz condition: 


\| gi(X, 0) — gi(X, 2) — (0; — 6) || S K || 6 — & || 


for every distinct pair 0; , in W (Or , a), then forall 6 in W(® , a), Or is the unique 
solution for ® of the equation Eg(X, 6) = 0. 
The proof of this remark is left to the reader. 













5. Upper and lower bounds for the expected squared error of 02(X; , X:,°-* » 
X,.). Denote by gi(X, 6),z2 = 1, 2, --- , m, the m components of the chance vector 
g(X&, 6). We now make an additional assumption. 

ASSUMPTION 4. : 


Elg:(X, Or)g;(X, 67)| — Aj 









exists fori = 1,2, --- ,mandj = 1,2,--- ,m. 

It follows from Assumptions 2, 3b, and 4 and the Lindeberg-Lévy form of the 
Central Limit 'Theorem that the vector ~/nG,(X, 67) tends in probability to an 
m-variate normal distribution with means zero and moment matrix (X,;). 

Now from Assumption 3a and Lemma 4.1 

| G(X, Or) || — |, os 


S || 6, 


_ || G(X, 67) | 
l+K ~ , 









For convenience define 


N= Dor. 


i=1 
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We obtain then 
2 nN 
E || G,(X, 67) ||" = ms 
It follows then from equation (5.1) that 
mn * 12 r 
—pzennemnazee, SE &, — || 4 — zs - 
ni + Ky = Fl Ori = n(l — K)? 


6. The consistency of maximum likelihood estimates. The results of this 
paper can be used to extend the class of consistent maximum likelihood estimates 
established heretofore [1].’ Assume that F(X, 6) admits a density function 
f(%, 6) with the property 


a _ f° of 
= [ 1%,0 x= [ 55% 0 dx. 


Then 
E £ In f(X, 6) | = 0 
96 1D SX, 0) | = 0. 
The maximum likelihood estimate of 67 is obtained by solving the equation 


0 
30 In L(X, 6) = 0, 


where 
L(X, 6) = I] f(X:, 6). 
Ifa sample X,, X.,--- , X, is obtained as the result of n random independ- 


ent drawings from the distribution having the c.d.f. F(X, 6), the sample values 
will satisfy Assumption 2. Assumption 3b holds as assumed above. If we assume 
also that the function 0/06 In f(X, 6) satisfies Assumption 3a, it follows directly 
from Theorem 4.2 that the maximum likelihood estimate converges almost 
certainly to the true parameter vector as the sample size approaches infinity. 

The author wishes to acknowledge his indebtedness and gratitude to Professor 
Jerzy Neyman for the many helpful suggestions made during the preparation 
of the paper. 
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DISTRIBUTION OF THE SUM OF ROOTS OF A DETERMINANTAL 
EQUATION UNDER A CERTAIN CONDITION 


By D. N. Nanpa 
University of North Carolina 


1. Summary. This paper is in continuation of the author’s first two papers 
[1] and [2]. In this paper a method is described by which it is possible to derive 
the distribution of the sum of roots of a certain determinantal equation under the 
condition that m = 0. This condition implies, when the results are applied to 
canonical correlations, that the numbers of variates in the two sets differ by 
unity. The distributions for the sum of roots under this condition have been 
obtained for / = 2, 3 and 4 and are given in this paper. This paper also derives 
the moments of these distributions. 


2. Introduction. The reader should refer to the first two papers of this series 
[1] and [2] for detailed explanation of the preliminaries essential for this paper. 
The distribution of any root of the determinantal equation, specified by its 
rank when the roots are arranged in a descending order of magnitude, was 
derived by the author [1]. The distribution of the largest root was expressed as 


(1) P-(@ < x) = C(l, m, n)Fi.mn(x) = const. (0,1,1 — 1, ---,1, 2; m,n). 


3. Method. Putting 6; = p;/n in R(I, m, n) as given in [1] and allowing n to 
tend to infinity, the distribution density reduces to 
R(l, m) = const. Ip; II (p; — pie” (0 < pi <pii<--- <p < ~), 

t<j 

where the constant is independent of n, by [2]. If we replace x by x/n in the 
right-hand side of (1) and allow n to tend to infinity, then the resulting function 
Gi. (x) is independent of n and it can be shown by comparing the two methods 
A and B in [2], that 


(2) Ril, m)l dp; = Gi.m(zx). 


irises 


This is a constant multiple of 


3) a, m) = [ Ne? TI : — ope-*" dp, 
O<pi<pli1<°**<pi<z t<j 


~ 


l+lm+l(l—1) /2 
= const.r°” 6(x, m). 


Putting pi = zy; , we have 


(4) [ ny” [I (y; — yer" 11 dy; = const. (x, m). 
O<yr<yr-i1<--*<wi<l t<) 
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The left-hand side is proportional to the moment generating function for the 
sum of roots when n = 0. 


Let yw. = 1 — 61, 42 = 1 — O11, -+>, yr = 1 — 6 ; then (4) gives 


5) [ (1 — 6,” I] @; — 6,e~****" 11 dé; = const. 6(z, m). 
0<0:<61_1< <61<1 t<j 


Let m be changed to n and both sides be multiplied by e’” , then we get 
(6) [ na — e)" Il @,; — 6;)e">* 11d0; = const. e”” 6(z, n). 
0<0;<61-1<°** <61<1 i<j 


The left-hand side of (6) is the moment generating function for the sum of roots 
when m = 0. 

The method for obtaining the probability distributions is described in detail 
for each of the cases 1 = 2, 3, in the following sections. 

It. may, however, be added here that the condition m = 0, implies that 
|p — q| = 1 in the case of canonical correlations. It also implies, in generalized 
analysis of variance, that if we have K samples and measurements are made on p 
characters then K — 1 and p should differ by unity. Thus the distribution is 
given for 5 samples and 3 characters when 1 = 3 (p = 3). 


4. Distribution of the sum of roots when m = 0. 
(a) 1 = 2. The value of G2, n(x) has been given in [2] as 


(7) Gom(x) = k(2, m) |2 [ tt oe du — 2" [ ue“ au], 
0 0 


where K(2, m) = 2°""*/T'(2m + 2). Then in the notation just given 


- 
Q2m+1 —2 +1 — _ 
g(x, m) = 2 [ ue du—2x” ee” [ ue“ du. 
<0 0 
Replacing u by xu, we get 


1 
2m+2 Qm+1 —2 am+2 6 ~ 
o(z, m) ii or m [ u m € ru du ie ? z [ m ru 
0 


gers 1 
ay a | 9 +. 
_ [ e Qru d(ur” *) a 


—rUu _ 
d(u 
m+1 Jo - [fe 
2m+3 


1 
« 2m+2 —2 Ho 
ec ——ag ES | ue ~"“du-—e- of u” ™ du 
m+il1 Jo 0 


gs a 


‘mM 


1 1 
’ 2m+2 —2 = rs an 
6(x, m) = const. E [ une ™ du — e* | ce au|, 
“0 0 
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and according to (6), 


[ II(1 — 6;)"(6, — 62)e"”"* do, dos 
0<02<41<1 
1 : 1 
= const. e” E [ gg ia ~ &* | fe" au | 
0 0 


1 1 
= const. E [ (i — «)**"* ™ du — [ (1 — «)*"* &™ au, 
0 0 
by replacing u by 1 — w. Or, 
1 1 
E(e****) = const. E | a — »)** &™* du — [ a—«)*"& aul, 
0 0 


The constant can be evaluated by putting x = 0. 

Then let P,(@; + 62 < Z) = const. [F,(Z) + F.(Z)], where F(Z) and F(Z) are 
cumulative distribution functions given by integrating the density (1 — u)°"™* of 
2u and (1 — u)”* of u, respectively. It is easily seen that 


F.(Z) = [ a — "du =(1 — (1 — Z)"?7)/(n + 2) (Z < 1). 


Since F(Z) is to be obtained from the density of 2u, we may substitute v 
and then integrate. Thus 


Z 2n+2 
F,\(Z) = 2 | (1 -%) dv/2 = 21 — (1 — Z/2)""**)/(2n + 3) (Z 
0 


Hence the result for / = 2 is 
P(A, + 2 < Z) = 2(n + 2)[1 — (1 — Z/2)""**] — Qn 4+ 31 — (Al — Z)""J 
O0<Z<)}), 
= 2(n + 2)[1 — (1 — Z/2)"""*] — (Qn+3) (152 < 2). 


(b) 1 = 3. The value of G;,,(x) as given in [2] is changed as 


G3,m(z) = K(3, m) {2 [ ute du | u"e “du —2 | ute du 
0 0 0 


m+2 — 


z x 1 
. [ yet? 6 du — x é | 2a [ u™ +2 et du — gmts e7 
0 0 


m+ 1 
1 
. [ git eo au}, 
0 


using (8). K(3, m) is a constant independent of n. Putting xu for u in only the 
first two terms of the right-hand side of the above equation, we get 
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1 1 
G3,m(x) = k(3, m)art® 12 [ PY cai oo au | ur” Pa du 
0 0 





' i may 1 
(10) ee 2 | yn? eo au [ ym e du — 2e | went? et du 
: ° 0 +1 


m 


so 1 
1 -_ 
+. | u™t o™ du >. 
m-+ 1 Jo 
3m+6 


By integrating by parts we get x as a common factor on the right-hand 
side of the above equation. Then according to (5) and (6) we have 


Wy? I] (y: — ype*** 1 dy; = const. {2m + 2) 
i<j 


[.. <y2<yi<l 


1 1 1 
+3 —2 +1 — — Q2m+4 —2 
. [ re ae [ une du + 2(2m + 3)e” [ ue du 
0 0 0 


1 1 
— 2 3 —2 —2. — 
= 4(m + 2e* | ute ™ du + e ‘i une au}. 
0 0 


Putting y1 = 1 — 63, y = 1 — 6, ys = 1 — 6, and, changing m to n and 
multiplying with e** we get 


| nd — 6)" I] @ — oem ao, 

0 <03<62<6, <1 i<j 
1 1 

(11) = const. {20 + 2) | f*t* ea) du | yr ea du 
0 0 


s 1 
+ 2(2n + 3) | aft tt feG—~) du ho A(n + 2) [ fs Pes du 
° 0 


+ [ fr" au}. 
0 
Thus we have 


P,(0; + 62 + 63 < Z) = const. {Fi(Z) + F(Z) + F3(Z) + Fi(Z)}, 


where F(Z), F2(Z), F3(Z) and F,(Z) are the contributions to the cumulative 
distribution by the four terms of the right-hand side of the following equation 


1 1 
E(e****) = const. {2ln + 2) [ a-— ""*?" au | (i — u)" e™ du 


1 1 
+ 2(2n + 3) [ (1 — u)****e™ du — 4(n + 2) [ GQ — a)" s du 
0 0 
. nt+2 ru d \ 
+ | (1 — u)"e he 


where const. = [(n + 2)(n + 3)(2n + 5)]. Proceeding according to the method 
given in (a) we have 


(12) F,(Z) = [1 — (1 — Z)**)/(n + 8) 0 <Z <1), 
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(13) F(Z) = 2(2n + 3){1 — (1 — Z/2)°"**]/(2n + 5) (0 < Z< 2), 
(14) F(Z) = —4(n + 2)(1 — (1 — Z/2)°"**4]/(2n + 4) 0<Z <2). 


Let us now consider F(Z), which is the contribution of the first term. Let 
y, and y2 bedistributed between 0 and 1 with densities (1 — y,)°"** and (1 — y.)"*" 


0 
Fra. 1 
respectively, then 


F(Z) = 2(n + 2) / / (1 — ys) "(1 — yo)" dy: dye, 
2yitveSZ 
where Z goes from 0 to 3. 


Let us consider the distribution over the unit square OABC, Fig. 1, then for 
Z<1,Z < 2,and Z < 3; we have to integrate over OLM, OCNP, and OCQRA, 
where LM, NP and QR are the three lines given by 2y, + ye < Z according as 
Z<1,2<2,andZ < 3. 

(i), The integration over OLM is given below 


F,,(Z) = 2(2n + 2) / / = WL — ys)" dyn dyz for Z <1, 


2yitve 52 


r 


F,1(Z) = 2) : 


G 2n+4 
a+" Oe: 


a 3 a Z 3n+6 
aha = yl [Tey3—z)(2n + 4,n + 3) 


— Ie_zysa—z)(2n + 4,n + 3) 


\ 


' 





SUM OF ROOTS 
. 9 =) 
= B(2Qn + 4,n + 3) = I yl — y)"* dy 
0 


ane 2n+3 +3 
Aleja—z) = I y” (1 — y)”” dy. 


(ii) The integration over OCNP is given below. 
(16) Fs,2(Z) = [1 — (1 — Z/2)""**)/(m + 2)(2n + 4) — 2""((8 — Z)/2]°"*® 
{B(2n + 4,n + 3) — AM @~zy@-z)(2n + 4,n + 3)}/(m + 2) (Z < 2). 


(iii) In order to integrate over OCQRA, we shall integrate over the unit area 
OCBA and subtract from this the value obtained by integrating over QRB. 
Thus, 


(17) Fya(Z) = 1/(m + 2)(2n + 4) — 2°78 — Z)/2)°" 
B(2n + 4,n + 3)/(n + 2). 
Hence the result for / = 3 can be expressed as 
P,(0; + 6 + 6; < Z) = const. {F,3(Z) + F.(Z) + F3(Z) + Fia(Z)} 
= const. {2(n + 2){{1 — (1 — Z/2)""*]/(n + 2)(2n + 4) 
— v2 2" 720" Toy a—z)(2n + 4,n +3) 
— Ig-z)s@-z)(2n + 4,n + 3)]/(n + 2)} 
+ 2(2n + 3)[1 — (1 — Z/2)'***]/(Qn + 5) — 2[1 — (1 — Z/2)""*4 
+ [1 — (1 — Z)""/(@m + 3)} 0<Z<)}), 


const. {Fi2(Z) + F2(Z) + F3(Z) + Fs(1)} 


const. EG + ay 1 — (1 — Z/2)°"**]/(n + 2)(2n + 4) 


gn(3- 2 1B n+ 4, n+ 3) — NM @~z)/—z)(2 n + 4, n+ 3)|/(n + 2) 


+ 2(2n + 3)[1 — (1 — Z/2)"**]/(2n + 5) — 2[1 — (1 — Z/2)""**] + 1/n+3 ! 


uses 
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and 


= const. {Fi3(Z) + F2(2) + F3(2) + F,(1)} 


= const. {200 op 241m +. 2)(2n 4 o- ont re — 


2 


B(2n + 4,n +3)/(n+ 2} + 2(2n + 3)/(2n +5) -—2+41/(n+ 3)} 
(2<Z< 83), 
where const. = (n + 2)(n + 3)(2n + 5) andA = B(2n + 4,n + 3). 
The exact distribution is obtained for 1 = 4 by the similar method. The final 
results are available with the author and are not given here due to lack of space. 
The method given in the above sections can be used to find the distribution 
of the sum of roots of a determinantal equation of any order under the condition 
m = 0. 


5. Moments of the distributions. The moments can be obtained by expanding 
the right-hand side of (6) in terms of z and then collecting the coefficients of z. 
The moments for 1 = 2 have been derived here and the method is illustrated 
below: 


(a) 1 = 2. Equation (9) gives 


S6- ( 1 oe 
[ (1 — 6;)"(6, — 62)e*°** 11 d8; = const. \2] (1 — uy" e'™ du 
0 <62<6;<1 0 


; 1 oo t 
= | (1 — u)*" au} = const. {2 I I—-w"* _ 
» t=0 


1 2 t 2% 
= _ a yatt yr (eu) | _ (2x)" T(t + 1) (2n + 3) 
l nm ir \ _— 2 = th) | 6oT(Qn+t+ 4) 
yr rt+ drat 2)\ _ f | 2x 
Li pri) = const. | 5553 |! + ae 


oa 


(2x)° * (2z)° 1. | 
(2n - 4)(2n + 5) (2n + 4)(2n + 5)(2n + 6) 


9 


x 
7 ata [I . ee 37 @+amtds 





x 


3 
+ Gtaa+5aF5) -}. 





Thus 


‘ii es .$% x 12(n + 2)(4n + 11) 
Ete y= {143 in +3) * 2im@+3)n + DQn + Dn +5) 


+ 


x 120(n + 2)(n + 3)(4n + 13) i+: | 
3! (2n + 4)(2n + 5)(2n+ 6)(n+3(n+4(n4+ 5 . 
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3/(n + 3), 


6(4n + 11)/(n + 3)(n + 4)(2n + 5) 
and 


30(4n + 13)/(n + 3)(m + 4)(m + 5)(2n + 5). 


The moments for 1 = 3 and 4 can be obtained in a similar way. 
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A NOTE ON THE POWER OF A NON-PARAMETRIC TEST 
By F. J. Massey, Jr. 


University of Oregon 


1. Introduction. Let 2, < 1, < --- < 2, be the ordered results of n inde- 
pendent observations of a random variable X which has a continuous cumulative 
distribution function F(z). The following test for the hypothesis that F(z) has 
some specified form, say Fo(x), has been suggested by Wolfowitz [1]. 

Form the cumulative distribution of the sample and obtain the maximum 
deviation of this from F(x). Thus if 


S,(z) = 0 when x < x 
J ’ 
k 
= when 7 < 2 < Xu, 
n 
= ] when x, < 2, 


the test statistic used would be 
= max | Fo(x) — S,(x) | Wn, 
Zz 


and the hypothesis would be rejected if d is large, say larger than dz which is so 
chosen that the probability of a type I error is a. The limiting distribution 
(as n — ©) of d has been tabled [2], and a short table of the distribution of d 
for various small values of n (n < 80) has been given [3]. 

The purpose of this note is as follows: 1. A lower bound for the power of the 
test is given. 2. This test is shown to be consistent against any continuous alterna- 
tive F(x) = F(x), where F,(x) ¥ Fo(x). 3. The test is shown to be biased for 
finite n. 4. An indication of similar results for a two sample test. 


2. Lower bound for the power function. Let A = max) Fo(x) — F(z) | and 


let x be a value of x such that A = | Fo(x%) — Fi(xo) |. The probability that 

d > d, is certainly not less than Pr{+/n| Fo(%) — S,(xo) | > da}. This is the 
same as 

d.. da 

1 — Pr F o(%o) — Ja < S,(%) < Fo(to) + — 

n / 


Vn)’ 
which, since S,(2) is the proportion of observations falling less or equal to 2 , 
is given by the binomial probability law. 

If F(x) = F(x) the probability of an observation being less than 2» is F1(x0). 
Since F(a) = F(x) + A the above probability can be written as follows: 
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1- Pr} Fi(20) +A-— da/Vn < S;,(Xo) x Fy (xo) +A + da/Vn} 
= 1— PritdA — de/Vn < Salto) — Filto) < & A+ da/Vn} 
= 1— Pr{(—da + AV/n)/WFi(a0) (1 — Fi(ao)) < (Sn(ao) — Filao)) Wn/ 


WV Fi(a0)(1 — Fi(ao)) < (da & AV n)/V Fila) (1 — Filo) }- 





A is fixed. It has been found [3] by observation for samples of size <80 that 
d, actually decreases in size as n increases. For sufficiently large n both 


—datAvV/n and dg+ AvVn 


have the same sign and the law of large numbers indicates that the above prob- 
ability approaches zero and the expression approaches unity. 

The last expression above can also be used as a lower bound of the power of 
the test for finite n. 

For large values of n this probability is given approximately by the normal 
distribution. Thus we can write for large n; 


Ae 


power > 1 — [ : oP at 
AA V/ 20 > , 








where 

M = (—da + AVn)/WVFi(xo)(1 — Fi(20)) 
and 

ho = (da & AVn)/VFi(ao)(1 — Fi(ao)) - 


If n is so large that \; and d2 are of the same sign and sufficiently different 
from zero we can replace F;(x9) by 3 and not decrease the value of the integral. 
In this case we might use as a working formula 


2(—da + AV/n), 
2(d. + AV/n). 


Ai 


I 


As 


Since 


he , 
t~ <a} "ak 
V 27 Jr, 


approaches one as n tends to infinity, the power, which is larger, must also ap- 
proach one, and thus the test is consistent. 

To demonstrate the biasedness of the test for fixed n consider the following 
picture. 
The F(x) is shown as a heavy line and an alternative F:(x) as a dash-dot line. 
F,(x) coincides with Fo(x) except between the point x = a and x = b. If S,(z) 
falls outside of the indicated band at any point we agree to reject the hypoth- 
esis F(x) = Fo(x). If F(x) = F(x) the S,(x) has no chance of being outside 
the band between xz = a and x = ¢, less chance between x = c and x = b than if 
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F(x) = Fo(x), and the same chance for z larger than b. This indicates that the 
probability of rejecting F(x) = Fo(x), if actually F(x) = F,(x), is greater than 
the probability of rejecting F(x) = F(x) if this is actually true. Thus the test 
is biased. 


3. Two sample test. Let S,(x) and S,,(z) be the cumulative distributions ob- 
served for samples of sizes n and m from two populations having continuous 
cumulative distribution functions F(x) and F’(x) respectively. Under the as- 
sumption that F(x) = F’(x) the limiting distribution (as n and m tend to in- 


finity) of d’ = (nx + mm”) 4 max,| S,(a) — Si(x) | has been found and tabled 
[4], but the distribution of this statistic for small n and m is not known. 

Suppose we wish to test the hypothesis that F(x) = F’(x) at level of sig- 
nificance a and agree to reject this if d’ is larger than d , where d, is the value 
which would be exceeded a proportion a of the time if the hypothesis is true. 
The values of d‘, are not known for small samples but are for the limiting case [4]. 

The same argument as in Section 2 gives a limiting lower bound to the power 
of the test in terms of 

A = | F(a) — F’(@) |, 


where 2 is the value of « which maximizes | F(x) — F’(x) |, to be 


' 


‘ 


2] - 
1 _ [. To -™ dt, 
1 V2r 











he 
an 
est 


ob- 
JUS 


bled 


sig- 
‘alue 
true. 
e [4]. 
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where 






/ 
1 — 























1, F@ot =F) 


m 





Since this lower bound approaches one as n and m approach infinity the power 
also approaches one and the test is consistent. 
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ON OPTIMUM SELECTIONS FROM MULTINORMAL POPULATIONS’ 


By Z. W. Brrnspaum AND D. G. CHAPMAN’ 


University of Washington 






















1. Introduction. Let Y:, Y3,---, Yn be scores in n admission tests such as 
those used in educational institutions, personnel selection, or testing of mate- 
rials, and let these scores be used as a basis for selecting a sub-population II* 
from an initial population IT. This selection is usually performed in such a 
manner that an achievement or performance score X has a distribution in II*, 
which shows some required improvement over the distribution of X in II; such 
an improvement may for example consist in changing the expectation E(X) of 
X in II to a pre-assigned value E*(X) in II*. Among all selection procedures 
based on Y,, --- , Y, and achieving the required improvement of the distribu- 
tion of X, it appears desirable to find those which retain as large a portion of IT as 
possible. It will be shown that under certain assumptions the linear truncations 
studied in an earlier paper [1] are such optimal selections. 


2. Selection, truncation, linear truncation. Let the frequency of individuals 
with the scores (X, Y1, °°: , Yn) be F(X, Yi, --- , Yn) in I and 


1 Presented at the New York meeting of the Institute of Mathematical Statistics on 
December 27, 1949. 


? Research done under the sponsorship of the Office of Naval Research. 
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F*(X, Y1, > Oe Yn) 


in II*. Since II* was obtained by selection from II, we have F*/F < 1, and since 
the selection was made solely on the basis of the values of Yi, --- , Yn, the 
ratio F*/F is independent of X. We thus have 


F*(X, Yi oe Yn) 


FX, Vi, 7 ¥,) ~ P0Yae os Pa) 





and 


(2.1) ean... %3S1 


ine « [ sin / Px, Fi... «++, FOE a, ~-- Os 00d 


N*¥ = I ee [ Pec, Fas ae Y,) dX dY; a dy, 


be the number of individuals in Il and II*, and f(X, Y1, --- , Yn) and f*(X, 
Yi, °°: , Yn) the distribution densities in IT and II*, respectively, so that F = 


Nf, F* = N*f* and I [fax ay, — I a | * ax ay, ps 
dY, = 1. We then have 


N*f* = oNf, 
nd 


a 
N* » 
(2.2) a [[-- fea, soe) Va) f(X, Yi, +++, Yn) dX dV --- dYy. 


Thus any selection of a subpopulation II* from II based only on Y,, --- , Ya, 
defines a g(Y1,--- , Yn) satisfying (2.1). Conversely, if the frequencies 


F(X, Yi1,°°° » Fa) 


in II are given, any measurable 9(Y1,--- , Yn) satisfying (2.1) defines new 
frequencies F* = ¢F and hence a selection from II based only on Yi, --- , Yn. 
These considerations lead to the following definitions: 
A measurable function g(Y;, --- , Yn) which satisfies (2.1) is called a selection 
in Yi, °°: , Yn. If, in particular, ¢ is the characteristic function of a set Q in 
(Y:,-::, Ya), that isg = 1 inQ and g = 0 in Q, then the selection ¢ will be 
called a truncation in Y,, --- , Yn to the set Q. If Q is defined by a condition of the 
form 


DW ayY;>t 
7=1 


with constant a; , t, then the truncation to the set Q will be called a linear trunca- 
tion in Y1,°°:, Yn. 
In view of (2.2) we will refer to 
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(2.3) r(¢) = [| sta oo, ae Y,)f(X, Tis alias v2 dX dyi, —s dY, 
as the fraction retained in the selection ¢. 


3. A lemma. We will need the following slight generalization of the funda- 
mental lemma of Neyman-Pearson (cf. [2]). 

Lema. Let G(¥i, --- , Yn), Gi(VY1, --: , Yn), -** » G@m(¥i, +--+ , Yu) begiven 
integrable functions and c,, --- , Cm gwen constants, and let (f) be the family of 
all measurable functions y(Y1,--- , Yn) which satisfy the conditions 


(3.1) 0< e(Vi,°°:, Yn) < 1 
+00 +00 
(3.2) [ eee [ 01, Hee Y,)GA(Y1, tee, Ty. dY,::-dY,=¢; 
for i= 1,--++,m. 


If there exist constants ky , --- , km such that the characteristic function 


go(¥1,°+*, Yn) of the set E E > 2. kG | = E belongs to (), then 
i=1 


(Ya.-++s¥n) 


+o +00 +00 +00 
(3.3) [ | eGaY,---d¥,> | | GY; +++ dY. 


oc 


for any ¢ in (9). _ 
Proor: We have gp = 1 => gin E and g = 0 < gin EZ, hence 


+00 +00 m 
| | (6 - SG) wars aY, 
hous = i=l 
+00 +00 m 
> | | (6 - SG) oar --- ava, 
ieua = i=1 
and (3.3) follows since go and ¢ fulfill (3.2). 


4. Selection from a multivariate normal population, for which the fraction 
retained is maximum. From now on we assume that the conditional distribution 
of X for given Y,, Y2, --- , ¥, is normal with a mean which is a linear function 
of the Y’s and with a variance which is independent of them, i.e., 


n 2 
1 -(x -” ie Pi v;) 
t=1 


(4.1) H(X | Ya, Yayo Yn) = TE exp 





2c” 

Let Q(Y:, --: , Yn) denote the marginal density of Y1,---, Yn. 
THEOREM 1. A selection such that 
1° in II* a proportion at most equal to a given proper fraction ¢ has values of X 
below Xo , i.e. the e-quantile in II* ts greater than or equal to Xo, when Xo is a 
given number greater than the e-quantile in TI, 
2° the fraction retained is maximum, 

is a linear truncation. 
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Proor: We have to maximize 
42) r= fo foi, YN, Ye) aM AY, 
under the condition 


Xo +00 +00 
[ [ ee / y(V,: my Y,)Q(%1,- ms, Y,)f(X | Yi,°°° me dY,---dY,dX 
= —2 pani mae — . 


[Lf eG Yi, YOK Yay Ya) dV dX 


Substituting the expression (4.1) for f(X | Yi, ---, Y,) and integrating with re- 
spect to X we may rewrite this in the form 


L(y) ae | eran [ (V1, a, YQ", a TJ 


As = Zz Pi Y; 
$( ——— —e|dY,--- dY, <0, 


o 


(4.3) 





where 
y(u) = ~ | gore 
V ot dw 
and we have to maximize (4.2) under condition (4.3). 

Without loss of generality the inequality L(g) < 0 in (4.3) may be replaced 
by equality. For if we had a selection g; which maximizes (4.2) and satisfies (4.3) 
with a strict inequality L(¢:) < 0, then g; could not be equal to 1 almost every- 
where since then we would have F* = F almost everywhere and Xo would be 
equal to the e-quantile in II, in contradiction with 1°; hence g2 = g: + a(1 — ¢1) 
for sufficiently small a > 0 would also satisfy (4.3) with a strict inequality but 
would yield r(gz2) > r(¢1). 

To solve our problem we now have to maximize (4.2) under the condition 


(4.4) L(g) = 0. 
Applying the lemma of Section 3, with m = 1, and 
G(N1, oe Y,) = O(N, =P Fads 


Xo —_ 7 Pi Y; 
=1 


Gi(¥1,°-* , Yn) = Q(Y1, +--+, Yn) y Son = @is 
we conclude that the selection satisfying 1° and 2° will be the characteristic 
function go(Y1, --- , Yn) of the set defined by 
Xo— Dee Yi 
(4.5) sh ol 8 
o 


provided i: can be determined so that go satisfies (4.4). 


tic 


DISTRIBUTION OF DISTANCE 447 


To find such a k we consider 


Xo- Di Vi 
I(t) - | ss [Wy ++, Yn v shenoseeearent —e|dYi--- dYn. 


>. piYsot 
i=1 


As t tends to — «, I(t) tends to L(1), where L was defined by (4.3). Since the 
e-quantile in II was less than Xo it follows that I(—#) = L(1) > 0. Since 
I(t) < 0 for large t, there exists f such that [(f) = 0, and clearly, 


Setting in (4.5) k = [W((Xo — to)/o) — . one obtains a gp such that 
L(g) = I (to) = 0. 


The selection gp is the linear truncation to the set bem piY; >t. 

By a similar and somewhat simpler argument one proves the following the- 
orem. 

THEOREM 2. A selection such that 

1° in II* the mean of X has a value greater than or equal to a pre-assigned num- 

berm > 0, 

2° the fraction retained is maximum, 
is a linear truncation to a set >o%1 piY; > to. 

An immediate consequence of Theorems 1 and 2 is that a linear truncation, 
using a properly determined weighted score i piY; and cutting score ty , is 
more economical than any truncation to aset Y; > t;,7 = 1,2, --- ,n, that is 
than any truncation performed on each admission score separately. 
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THE DISTRIBUTION OF DISTANCE IN A HYPERSPHERE 
By J. M. HAMMERSLEY 


University of Oxford 
1. Summary. Deltheil ({1], pp. 114-120) has considered the distribution of 
distance in an n-dimensional hypersphere. In this paper I put his results (17) 
in a more compact form (16); and I investigate in greater detail the asymptotic 
form of the distribution for large n, for which the rather surprising result emerges 
that this distance is almost always nearly equal to the distance between the 
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extremities of two orthogonal radii. I came to study this distribution by the 
need to compute a doubly-threefold integral, which measures the damage caused 
to plants by the presence of radioactive tracers in their fertilizers; for the dis- 
tribution affords a method of evaluating numerically certain multiple integrals. 
I hope to describe elsewhere this application of the theory. 


2. Derivation of the frequency function. Let 7’; and 7, be vector spaces of n 
and 2n dimensions respectively. Let P and Q be any pair of points in 7; . Denote 
by (PQ) the point in JT. , whose first m coordinates are the coordinates of P 
in 7, and whose last n coordinates are the coordinates of Q in T; . Let {P} and 
{(Q} be point sets in 7 , and let {PQ} be the point set in T; such that (PQ) e {PQ} 
if and only if both P « {P} and Q « {Q}. Let M,{P} denote the n-dimensional 
measure of the point set {P} in T; , and let M2{PQ} denote the 2n-dimensional 
measure of the point set {PQ} in T, . Then 


(1) M.{PQ} -/[ Mi{Q} dMy\P}. 
{P} 


Let R be a fixed point in T; ; and let S,(a) be the n-dimensional hypersphere 
in T, with centre R and radius a. Let A and B be any two points chosen at 
random in S,(a), the distributions of A and B being independent and uniform 
over the interior of S,(a). Denote the distance AB by r; and let \ = r/2a, 
so that \ may take any value in the interval 0 < A < 1. We require the fre- 
quency function of A, which we shall denote by f,(A). 

The volume content of S,(a@) is 


(2) V,(a) = 2”?a"/T(4n + 1); 


and the content of the segment of the surface of S,(a) bounded by a right hyper- 
spherical cone, whose vertex is at R and whose line generators make a fixed 
semi-vertical angle 6 with a fixed radius of S,(a), is 

aM I2 and po 


(3) U,(a, 6) = -# ; sin" *¢ do. 
As a particular case of (2), the whole surface of S,(a) has content 
(4) U,(a, r) = 2n""a""/T(3n). 

Let {AB} be the point set in TJ. such that (AB) e {AB} if and only if the cor- 
responding points A and B satisfy all the inequalities 
(5) 0< RA <a, 0 < RB <a, r<ABcrs+dr 
Then, by the definition of f,(A), 

M2{ AB} «f,(r/2a) dr; 

but since 


2a 
.. I fr(r/2a) dr/2a = 1, 
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we have 
(6) M.{AB} = V>.f,(r/2a) dr/2a = p,(r, a) dr, say. 

Consider also the point set {CD} in T, such that (CD) ¢ {CD} if and only if 
the corresponding points C and D satisfy all the inequalities 

(7) O< RC Sa+da, a<RD<a+ da, r<CD<r+d. 

For each fixed D of {D}, C is constrained to lie on the segment of the hyper- 
spherical shell of thickness dr, radius r, and centre D, bounded by the inter- 
section of this shell with S,(a@ + da). The hyperspherical cone, with vertex D, 
whose line generators all pass through this intersection, has a semi-vertical 
angle @ given by 

(8) cos @ = r/2a = ); 

and so, from (3), the M, of all C which satisfy (7) for each fixed D is U,(r, are- 
cos \) dr. On the other hand the ™, of all D which satisfy (7) is the content of 
the hyperspherical shell of thickness da, radius a, and centre R, and is thus 
U,(a, +) da by virtue of (4). Consequently, from (1) 

(9) M2{CD} = U,(r, arecos \)U,(a, x) da dr. 

On the other hand, by symmetry, M2{CD} = 4M.{EF}, where (EF) e« {EF! 
if and only if the corresponding points E and F satisfy either all the inequalities 
0< RE <a+ da, a< RF <a+ da, r<EF<cr+dr,, 

or all the inequalities 
0< RF <a+da, a< RE<a+da, r< EF <rst+dr. 


We can express this in another way by saying that (EF) e {EF} if and only if 
the corresponding points E and F satisfy all the inequalities 


0< RE <a+da, 0< RF <a-+ da, r<EF<cr+dr, 
but do not satisfy all the inequalities 
0 < RE <a, 0 < RF <a, r< EF <rd+dr. 
From this second point of view we see that 


0 
M.{EF} = p,(r, a + da) dr — p,(r, a) dr = r* pr(r, a) dr da; 


and so 


0 
(10) M2{CD} = tar pr(r, a) dr da. 


Then from (2), (3), (4), (6), (9), and (10). 


1a woe a (Z)-2 
2 da \[T (an + DP" \2a/ 2a 


(Q_r Hit wt - , . | ee 
8 rreggeeoomegge an” 6 db) (a? 
\TGr—® 4 j 


(11) 


T(4n) ° 
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By performing the partial differentiation on the left-hand side, then substituting 
z= cos¢andr = 2aX, and using the relations 
Tian + 1) = 3nT(Sn), r’T(n + 1) = 2°7(an + 4)T (an + 1), 
B(jn + 3,3n + 3) = {P(4n 4+ 4)}°/Tin + JD), 
we reduce (11) to the form 


2n(n — 1) (n—3) /' 
ee 1-7)” dz. 
Matimsp,%-* 4 


We multiply (12) by —\ ~~" and use the reduction formula 


(12) (2n — 1) f,() — fA) = 


1 1 
(13) (n — » [ aQ— 2)?" dz = nf A= 7" ™* a4+10 - r™ 
» r 


Each side of the resulting equation is a perfect differential coefficient, and upon 
integration we obtain 


2nn" - Sia 
(14) fr) = cick 5 fa yee? dz + a, 


where C is the constant of integration. We obtain the cumulative distribution 
function by integrating (14) over 0 to 4, 


(15) F,(A) = (2d)" Tine (An + 3, 3) + Tye (n + 4, 4n + 4) + CA" /2n, 


where J,(p, g) is the incomplete beta-function ratio 


tina = | 2 (1 — 2) dz/B(p, 9) 
0 


tabulated by Pearson [2]. Putting \ = 1 in (15) we get 
1 = F,(1) = 1 + C/2n; 

so C = 0, and we have the final result 

(16) fa(X) = 2°nr""Ki-a2(3n + 3, 3). 


This compact form may be compared with Deltheil’s expression [1] for the fre- 
quency function of r, namely 


2 n—!1 a 
= yr n— r 
(17 gn(r) = 2 / p "le (") dp, 
a’ r/2 p 


where 
3n—6O i tr 
h,(2 sin @) = sin” “¢ as / | sin” ~¢ d¢, 
0 0 


expressions which he evaluates only for the particular cases n 
Interesting particular cases of (16) are 
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(18 T 


fs(X) = 12°01 — d)*(2 + 2), 
which give the appropriate frequency functions for a line, a circle, and a sphere 
respectively. 
3. Recurrence relations and moments of the distribution. From (13) and (14) 
we have a recurrence relation for penadjacent values of n, 


oT 
(19) = — p2fh® _ __ 2h) 








ee = 7h _ ,2\(n—1/2 
.-) Oar er: 


In connection with (18) this shows that 


(20) fonsi(d) = Pani(A), — fon(A) = Pana(d) arecos X + Pno(A)(L — °)"? 


, 


where Px(A) denotes an unspecified polynomial in \ of degree N or less. 
From (16) the rth moment of f,(A) about A = 0 is 


, mT(n+1))f Tra@n+3r4+4) ) 
(21) ans. ee tee we Soe ee te 
Tign + 3)) (a+r) T(r + Gr + I); 
I have not been able to obtain the characteristic function of f,(A) explicitly 
from (21) it appears to be of a higher type than the hypergeometric function. 
4. The asymptotic form of the distribution for large n. The distribution func- 
tion is, by (15), 


(22) F(A) = (2d)"Ih-ne(3n + 3,4) + Iye(hn + 3, 3n + 3). 


We show firstly that as n — o the first term of this expression tends to zero. 
This term is clearly zero if \ = 0. If A > 0 


1—\2 


1—\?2 
I zin-P2y  g)8 ge < 7 ZY ge = (1 — A274 (m 4 1) dX, 
0 0 


Hence 
< (2)"TGn+ 1) a — ayer’ 
~ m2DGn + 3) (gn + 3) A 
< Ya — ra — ayers x AG ED 50 
as n — . Secondly, asn — ~ 
Tha(gn + 3, n + 3) ~ Nre(QQ, 1/4(m + 1)) ~ Nae, 1/40), 


(see Cramér [3] p. 252 with p = q = 4), where N,(u, o°) is the normal cumula- 
tive distribution function of x for mean yu and variance o. Hence J is asymptoti- 
cally distributed as N\(1/+/2, 1/8n); and the asymptotic distribution of r is 
N,(av/2, a’/2n). This establishes the result stated in the summary. 


(2d)” Ih_-nz(3n + 4, 4 


— 
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It can also be proved, by considering the limiting form of the recurrence rela- 
tion (19), that the frequency function f, is asymptotically normal. The main 
difficulty of proving this fact lies in showing that the frequency function actually 
possesses a limiting form; and the proof is rather too long to be given here. 
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A NOTE ON THE ASYMPTOTIC SIMULTANEOUS DISTRIBUTION OF 
THE SAMPLE MEDIAN AND THE MEAN DEVIATION FROM 
THE SAMPLE MEDIAN 


By R. K. ZEIGLER 
Bradley University 


Consider a random sample of 2k + 1 values from a one-dimensional distribu- 
tion of the continuous type with cumulative distribution function (cdf) F(x) 
and probability density function (pdf) f(z) = F’(x). Let the mean, standard 
deviation and median of the distribution be denoted by m, o and 6 respectively 
(@ assumed to be unique). We shall suppose that in some neighborhood of 
x = 6, f(x) has a continuous derivative f’(x). 

If we arrange the sample values in ascending order of magnitude: 


YH < Yo Stee < Tea, 


there is a unique sample median 2,4; which we shall denote by & The mean 
deviation from the sample median is then defined by 


2k+1 
1 


D |a — =|. 


2k ti 


M = 


In the material that follows we shall assume that the sample items have been 
ordered only to the extent that k of them are less than £ and i: of them are greater 
than é. 

We then have the following 

THEOREM. Let f(x) be a pdf with finite second moment, continuous at x = 6 with 
f(0) # 0. Then the simultaneous distribution of § and M is asymptotically normal. 
The means of the limiting distribution are 6, the population median, and u’, the 
mean deviation from the population median, while the asymptotic variances are 
1/4f°(0)2k and ((m — 0) + o& — w”)/2k. The asymptotic expression for the 
correlation coefficient is (m — 6)/~/(m — 0)? + o? — uw”. 

Proor: Let u = (M — w’)+/2k andv = (& — 0)+/2k, where u’ = E|\ x — 0|. 
Then the simultaneous characteristic function of the two random variables u 












mh OS — — «Ff 


th 
. 
ve 
re 
he 





o(t ’ te) 





AN 
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and v is given by the following: 
Big****) 


_ Efe ad ittnin 


2k+1 
E exp it > a lu —é|— 2) V 2k + itelé — OVE | 


= + as I [- fff 


2k+1 k 
ix te — a Lj 
- exp | tt ee — ul V2k + ihlt — 0/2 


F(as)f(ee) +++ f(xe)f(tese) +++ f(tors)df(é) 
AXox41 +++ AXpy9dr, +++ dxdt 


t 


(Qk pep pe ft 
- ar f. exp { Va & + wh se az 


| I : exp ‘VE @- u)\ f(x) az | Gre" pw 


Upon making the substitution = 6 + y/+/2k, the above expression can be 
reduced to the following form: 


, _ @k+! f° ff pe Pt ' 
(1) (th, &) = \/ oR(kI) LLL exp | vai * u) | 02) ae 


‘ - sia | - Ti (x + uv) | f(x) az | 
° if exp | rs (x — w) | 12) ae 


~ _ exp E- (x — w) | §(0) ac |} 


ttoy Y 





[ exp | - ee + W) | s@ dx = 3 — vi [. (x + wu’)f(x) dx 


ti 
~ 22k) 


“+ wise) de + HECK MD ; 
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and 


- aty F at o ¥ 
[ exp E (x — u | f(a) dx = 3 ©? fax (x — w)f(x) dx 


ti - \2 7. . G2(2 
— SOK) ts (x — wu’) f(a) dx + — 


where for every fixed ; , 6:(2k, 4) and £2(2k, 4) ~Oask— o. Similarly, under 
the substitution z = (2/+/2k) + 8, 


6-+(y/+/ 2k) ity 1 ? Z 
x = (x — w’ z) dx = — a> + 8) dz 
[ exp | \/ale (a u ) | #0) da Ja & f i 7 + ) ( 


ith y 2 Z ¢3(2k, ty) 
tn gees. Sin came , —_. ~ ee eee 
+ tL (ate #)t Gato) et SY, 
and 


0+(y/+vV/ 2k) i _ 1 y 2 
ex _-—= j, (w +u (zd = —= | r( = + s) dz 
I P L » | f(z) V2k Jo ° \V 2k 
rt; , & _ r =. ¢4(2k, ty) 
~ 3b (Fa sid ) (Fa . ) ——— 
where £3(2k, t:) and ¢,(2h, t;) ~ 0 ask — o foreach fixed t, . Substituting these 


expressions in (1) and performing the indicated multiplications we find after 
some calculation that (1) can be reduced to the following form: 


’ 


‘, h) 


















> (Qk + 1)! ile + a) = 4ith(m — @)yf (Fe 7a. + s) 

g(t, b) = 2 /D4:(Ie1)? 27 oe ee 
on aut (A 75h. + ) + ¢(2k, t) 

an U $F ems ¢ + Sa) dy, 


where 0 < 2 < y and ¢(2k, t:) — 0 for every fixed t, as k — «©. Now taking the 
limit as k — «©, we mre 


lim o(t,, te) = [ RE = 2 exp | -% (co? — wu”) 


4 Ham — OFOy _ Oly’ + 5, v| f(0) dy. 


Upon performing the integration, 
lim o(t:, t) = exp | -4 (a iin ~- 0? +6 — «I 


2t t2(m — @) t5 | 
+ ae ~T ral : 
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Since o > u”, this is the characteristic function for two variables which are 
normally distributed. Thus, the simultaneous distribution of § and M is asymp- 
totically normal. It is of interest to note that, if the pdf f(x) is symmetric, the 
correlation coefficient is zero, and J and é are asymptotically independent. We 
might also note that g(t; , 0) isthe characteristic function for the mean deviation 
from the sample median. Thus, the random variable J/ is asymptotically normal 
with asymptotic mean and variance w’ and ((m — 6)° -¢ - u’) /2k respec- 
tively. 

The author wishes to express his appreciation to Professor A. T. Craig for 
valuable suggestions in the study of this problem. 
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NOTE ON THE EXTENSION OF CRAIG’S THEOREM TO NON-CENTRAL 
VARIATES 


By OsMER CARPENTER 


Carbide and Carbon Chemical Corporation, Oak Ridge 







A theorem due to A. T. Craig [1] and H. Hotelling [3] concerning the distribu- 
tion of real quadratic forms in normal variates is extended to the case of non- 
central normal variates with equal variance. 

The following notation is used: A, A; , As are real symmetric matrices, L is an 
orthogonal matrix, I is a diagonal matrix of latent roots, and X, Y, M and U 
are column vectors. 

Tureorem. Let X’ = (a1, +--+ ,2n) be a set of normally and independently dis- 
tributed variates with equal variance o and means M' = (m,, ++: , mz) . Then, 
a necessary and sufficient condition that a real symmetric quadratic form 
Q(X) = X’AX of rank r be distributed as o°x’, where 









p(x’, 7, 0°) = $e 2/2)? eh? 


(1) la en iiitines 
D (r’x7/2)7/j!TI(r — 2))/21, 


7=0 











is that A? = A. If Q(X)/o° is distributed by p(x’, r, ”), then © = Q(M)/2e’. 
Further, let Q(X) = X’ALX and Q(X) = X’'AX be real symmetric quadratic 

forms of ranks r; and ro. Then a necessary and sufficient condition that Q:(X) 

and (Qo(X) be statistically independent is that AyA2 = 0. 

Proor. The theorem is proved by establishing the equivalence and factoriza- 


tion of moment generating functions [4]. The moment generating function of 
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p(x’, 7, d’) is 
(2) G(t) = Ee'*” es eria—oey = t)” 2 


Let a1,°°:, 2%, be normally and independently distributed with means 
E(x;) = m; and common variance o°. Without loss of generality, we may take 
o = 1, changing to the general case when necessary with the transformation 
Li = 2:/C. 

Let Q(X) = X’AX be a real symmetric quadratic form of rank 7. Then the 
moment generating function of Q(X) is 


(3) Gol) = Ee’ = (Qn) *” / a 


2 


oe n 
—4[(X—M)’(X—M)—X’tAX)] 
e dx; ; 
— 1 


If ¢ is restricted to values such that |t! < | 1/yo!, where yo is the dominant 
latent root of A, then J — (tA is positive definite and 


" —n/2 4M’tA(I—-tA)~1M 
Go(t) = (24)~""" e’ 
Q wit 
30 oO n 
—4{[x—(1—tA)—1M]’(1—tA)(X—(1—-tA)—1 mM 
(4) / cee | om ' | II ae; 
= aan 1 
M’tA(I—tA)—1M |— 
= ¢ |} 2 — ta |. 


If ZL is an orthogonal matrix such that 


70 ---0 
Cet. « ¢ a f" ***** 1, 
00--: yn 
where the vy; are the latent roots of A, then the transformation M = LU gives 
(5) Gf) = gro’. f ~ arf. 
A necessary and sufficient condition that Ge(t) = G(t) is that A” = A. If 


A* = A, then all of the latent roots of A are +1 or 0, and sufficiency can be 
established by substituting the appropriate value of each y; into equation (5), 
giving 

(6) Got) = "9 — 1? = GY. 

Also = Sof ywui/2 = 4(U’TU) = 4(M’AM) = Q(M)/2. 


It is apparent from the form of Got) that a necessary condition for 
Go(t) = G(t) is that | 7 — tA |? = (1 — t)”””. But it has been proved by Craig [1] 
that the condition A* = A is necessary, as well as sufficient, for this equality. 

Next, let Q(X) = X’A,X and Q.(X) = X’A2X be real symmetric quadratic 
forms of ranks 7, and r.. Then from (4) 


G(t te) zs Ee't@12 + 82@2/2 
> 42 4 


4M '(t,A,+t2A2)(I—ty41—t2A2) 7! - 
e (t1A41 +t242)( ofa | TT — Ay — Ag |* 


(7) 


’ 





Lj 


es 


for 


(1 
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t; , tg being restricted to values for which (J — t,;A1 — t.Ae) is positive definite. 
A necessary and sufficient condition that G(t, ts) = Go(ti)-Go(te) is AyA2 = 0. 
The required equation in the moment generating functions is 


G(t ; te) _ eM t141—t14))- 1M | I Pw. t,A1 - 


M't9A9(I—t2A2)~1M 
-e intial | I = toAs | 


(8) 
Assume A,;A, = 0. Then (J — tA, — t242) = (I — tA) — teA2) 
and | J — 4A; — tA2| = | I — t4Ai|- | I — teAe| . Also 

(t:41 + te4o)(I — tyAr — tee)” tA — tA,:) + tA(I — bA.)™, 
(I — tA)” — I, this becomes 

(I — teAs) “(UI — Ay)” = (UT — Ai) + UI = teAs)™ — 1. 


I 


for using the identity tA(Z — tA)” 


Multiplying both sides on the left by (J — t.A2) and on the right by (J — tA), 
the identity follows. Thus the condition is sufficient. 

It is apparent from the form of the moment generating functions that a 
necessary condition for G(f,, t2) = Ge(t)Ge(te) is that |Z — tA, — hA2| = 
|I — tA,|- |Z — t:42|. However, it has been proved by Hotelling [3] and 
Craig [2] that the condition A,A2 = 0 is necessary for this equality. 


An extension can be made to correlated variates. Let X’ = (a, +--+ , Xn) 
be normally distributed with non-singular correlation matrix B and means 
M’' = (m,---,m,). Then there exists a non-singular transformation X = TZ, 


such that the variates Z are independent and have unit variance. Thus 
T BT’ =1,B = TT’ and Q(X) = X’AX = Z'T'ATZ. Applying the theorem 
proved above, a necessary and sufficient condition that Q(X) be distributed as 
x’ is that (T’AT) = T’'ABAT = T'AT, or that ABA = A. As before, 
> = Q(M)/2. In the same manner, a necessary and sufficient condition for 
independence of Q,(X) and Q2(X) is that (T’AiT)(T’A2T) = T’A,BA2T = 0, 
or that A,BA, = 0. 
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A SECOND FORMULA FOR THE PARTIAL SUM OF HYPERGEOMETRIC 

SERIES HAVING UNITY AS THE FOURTH ARGUMENT 

By HERMANN VON SCHELLING 

Naval Medical Research Laboratory, New London, Connecticut 
A convergent hypergeometric series with 1 as fourth argument has been 
expressed by Gauss, using gamma funetions, as follows: 
a-B  afa+1)-B6B+1 , | _ TO) —a— 8) 
y:1 vy + 1)-1-2 My — a) (y — 8) 
Let us denote the vth partial sum of F(a, 8, y; 1) by F.(a, 8, y; 1), and let us put 


(1) Fla, 8,¥;1) = 1+ 


(2) F,(a, B, v; 1) 


Pe? ew Chee ad. 
iainn- se 





The following equation is obvious: 
(3) G,(a, 8, 7) = G,(8, a, y). 
In [1] it is shown that 
(4) G.(a, By vy) = 1—-G.a(y,7 —- B-— ay at p) 
is valid if a is a positive integer. 

If (y — 8B — a) isa positive integer, (3) and (4) yield 

G,(a, 8, y) = 1 — Galy — B — a, 2,7 — a + ») 
= G,_s_a(a, B, a + B+ »v). 

In terms of partial sums of the hypergeometric series this becomes 


(iy — a) Py — 8) 


a F, » B, 7; 1 
'GiG -s-0 OEY 





” ( )r¢ ) 
Vlatyre@t+y) , 
= ——_—_—__~ F,_3_a(a, B,a + B+; 1), 
Vw)P(a + B+ v) 
which is a new formula involving partial sums of hypergeometric series with 1 
as fourth argument. It is more useful than (4) ify — 8 —a@<aory < 2a+ 8. 
It is of theoretic interest that the arguments of the new series do not depend 
on the third argument y of the original series. Therefore it is possible to develop 
a simple recursion formula. If we write (5) for (y — 1!) instead of y, the series 





of the second member has one term less. Subtracting these equations yields 
after some simplifications 








1 Opinions or conclusions contained in this paper are those of the author. They are 
not to be construed as necessarily reflecting the views or endorsement of the Navy De- 
partment. 





HYPERGEOMETRIC SERIES 459 


GY -—a- 0G — 6— DF, 4,7; 0 


6) — (y¥-—B-—a-— 1) — I)F,(a, B, y — 1; 1) 
_T@e+a) TO+8). l'(y) Td) 
l(a) ris) Te+y-—1) TO)’ 


Many recursion formulas are known for hypergeometric functions, but (6) may 
be the first equation of this type linking two hypergeometric partial sums of v 
terms each. 

In order to demonstrate the numerical advantage of the new formula (5), 
we restate the example of [1]. An urn may contain N balls of which a black and 
b white. A single ball is drawn. We note its color, return the ball into the urn 
and add A balls of the same coior. The probability that the nth black ball 
appears at the latest in the n-th drawing is 


“(2 a 
(o+1)--[24+m- | , 
a a a © ee i —ny41 (ns, : Z +m; 1), 

N/(N N A’A 

x (* + 1) a |X + om — 1 | 
If * is a positive integer (5) yields 

Win) = (n—n+1)(n —m+2)---n 
(8) (P+n—m+1)(P+n—m+2)---(b4n) 
b b 
Faja (n AvA a n a is 1) 
If we take 
A =1, a= 1, b=N -1, 

we get 
(9) W(n) = ni(N+n—mn — 1)! 


(n—n)(N+n—-1)! 

Calculating W(n), using the original formula (7), is quite tedious, but (5) 
sometimes simplifies the numerical work. Let us calculate the probability W(6) 
that the third black ball appears in the 6th drawing, if the number of the original 
balls is N = 10. Using formulas (7), (4), and (9) respectively we have 

to) 3-9 , (3-4)(9-10) (3-4-5)(9-10-11) |] _ 4 
\ = oo | eres a ae peace eames SE an? 
NO) = Sor i) + ga? Gs-aaya-2) + G3-14-15)-2-3) | ~ O° 


—_ 12!9! 4-1 (4-5) (1-2) | — 
es igi [t+ i? G4-15)0-2]~ 91° 





W (6) 


4 
3!15! 91° 
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The time saved in using both formulas, of course, increases as the number of 
terms, n — n, — 1, of the original series, increases. 

Let us mention that the special distribution corresponding to (9) does not 
have finite moments. For arbitrary values of N, a, A the arithmetic mean is 


(10) E(n) = a *. nm, 
a-A 





the expectation of n(n + 1) is 


(N — A)(N — 2A) 


(11) E{n(n + 1)] = (@— Aja — 2A) 


¢m(m + 1), 


and finally the variance 


(N — AV(N — a) — DA+a). 


) 2(n) = 
(12) o (n) (a _ aya - DA) mm. 
The mode can be derived from the fact that 
" 
(13) win+1)=w(n) for n= si ™ — 1). 


Especially we get w(11) = w(10) for our numerical example. 

The mean and variance do not exist fora = A = 1, as in our example. How- 
ever, it is possible to find a number n so that W(n) takes any value near to 
unity, for instance .99. For large n and small n; (9) yields the approximation 


n(n — 1) --- (n—m4+ 1) 





W(n) = 
(n) (N+n—1)(N +n — 2)--- (N+n—n) 
ny ae 1 ny 
oe 
ore Ss 
N+a- a 
\ z 
Hence, W(2666) = .99 for our example. One needs 2666 trials if one wants a 


99% probability for getting three black balls. This surprising result cannot be 
derived from the original formula (7). 
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ABSTRACTS 


ABSTRACTS OF PAPERS 
(Abstracts of papers presented at the Chicago meeting of the Institute, April 28-29, 1950) 


1. The Distribution of the Quotient of Ranges in Samples from a Rectangular 
Population. Paut R. Riper, Washington University, St. Louis, Missouri. 


The distribution of the quotient of the ranges of two independent, random samples from 
a continuous rectangular population is derived. The distribution is independent of the 
population range and can be used to test the hypothesis that two samples came from the 
same rectangular population just as the distribution of the variance ratio is used to test 
whether two samples came from the same normal population. 


2. A Geometric Method for Finding the Distribution of Standard Deviations 
when the Sampled Population Is Arbitrary. (Preliminary Report). Pau 
InicK, Purdue University. 


asx<ax<xb,letri = tin, — 2%; 20,2 = 1,2,--- ,n—1. Make the transformation 


ree —p/irint g/t ni, 
2% 22 


and call U’ the 1/n! portion of the r’ space bounded by the n — 1 sphere and hyperplanes, 


For an ordered random sample, x; S x22 S --: S 2,, chosen from a population, f(z), 

















n—1 ° 

22 , _ ° ° 

~ rz; = 2ns?,7r; = iV: St , i =1,2, --- , n — 1, where s is the sample standard 
t 


deviation. The point density in U’, 6(r’), is the transform of 


b-—=r; 
8(r) = [ fledfler +n) «++ fle, +n tee + rea) dn. 


1™=a 


Change to generalized polar coordinates and call U the outer hyperspherical boundary 
of U’ whereon the density is designated by 5(+/2ns, ¢). Then p(s), the probability law for 
s, is given by 


p(s) ds = n!n”/2sn-2 as [ ee [ 5(+/2ns, ¢) sin"=* g «++ sin gas dgne--- dei, 
#1 Pn—2 


where 


arc cos ¥ccars***™ $= [tanec |,i= 1,2, »n— 2, 
(n —i)(4¢@+ 1) t+1 


whenever »b is infinite. The distribution of sample range is readily found in U’ and is 
expressible in the same form as p(s) with the same limits of integration. When 0 is finite, 
the complete integral holds only for 0 < s S (6 — a)/+/2n, there being n?/4 connected arcs 
in p(s) if n is even, and (n? — 1)/4 ares if n is odd. The axes are rotated to give relatively 
simple formulas for p(s) when n S 4, the case of n = 5 also being discussed. The method 
readily produces previously reported results for p(s). In the application of the method, 
particular attention has been paid to the Type III and polynomial Type I populations. The 
density function provides much information concerning the form of p(s) for various popula- 
tions, and contours of constant 6 in U’ are of theoretical interest. 
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3. Probability of a Correct Result with a Certain Rounding-off Procedure. 

W.S. Loup, University of Minnesota. 

Consider the problem of the addition of n numbers expressed in the base B of numeration. 
Supposing each number known to arbitrary accuracy, to obtain the sum accurate to k places, 
one may round off each number to (k + 1) places, add, and round the sum to k places. If 
the numbers are assumed uniformly distributed, the probability that the above procedure 
gives the correct result may be found explicitly by use of characteristic functions. If the 


base B is odd, the result is 2¢r8)-/ sin®u sin?Bu u-"—! du, and if the base B is even, 
0 


«o 
2(rB)- | sin? Bu cos uu-"—! du. Both formulas have the asymptotic formula 6'/?B(rn)-¥2 
0 


as n becomes infinite. 


4. Analysis of a One-person Game. (Preliminary Report). W. M. Kincam, 

University of Michigan. 

The problem of allocation of supplies is one which arises in many military and economic 
connections. The present report discusses a game constructed as a model of a simple situa- 
tion of this type. The player is given a supply of cards, and receives payments for giving 
these up when certain random events occur during the period of play. 

The optimal strategy, which maximizes the expected value of these payments, is gov- 
erned by certain critical times such that the player’s response to a particular event depends 
on whether it occurs before or after one of these times. 
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NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news ttems of tnterest 


Personal Items 


Dr. Leo A. Aroian, on leave from Hunter College, is acting as a Research 
Physicist in charge of computations at the Hughes Aircraft Co., Department 
of Electronics and Guided Missiles, Culver City, California. 

Dr. Ralph A. Bradley from McGill University, Montreal, Canada will join 
the staff as Associate Professor in the Department of Statistics at Virginia 
Polytechnic Institute on July 1, 1950. He will devote the majority of his time 
to research on rank order statistics. 

Dr. E. R. Dalziel has relinquished his post as Assistant Master at Technical 
School, New Zealand, to become Senior Engineer with the Overseas Telecommu- 
nication Commission, Australia.- 

On September 1, Dr. David Duncan from the University of Sydney, Sydney, 
Australia, will join the statistical staff of Virginia Polytechnic Institute as Asso- 
ciate Professor of Statistics. He will devote the majority of his time to teaching. 

Dr. C. H. Fischer has been promoted to the rank of Professor of Actuarial 
Mathematics in the Department of Mathematics and Professor of Insurance in 
the School of Business Administration, University of Michigan, Ann Arbor, 
Michigan. 
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Dr. E. J. Gumbel, Professor of Statisties at the New York New School for 
Social Research, has been appointed Consultant to the National Bureau of Stand- 
ards and has been awarded a Guggenheim fellowship for finishing a book on the 
theory of extreme values. 

Dr. Eugene Lukacs, who has been on leave from Our Lady of Cincinnati 
College and working as a Statistician for the U. 8. Naval Ordnance Test Station, 
Inyokern, California, is transferring to the Statistical Engineering Laboratory, 
National Bureau of Standards, Washington, D. C. 

Dr. R. B. Leipnik, formerly a member of the Institute for Advanced Study, 
has accepted a position as Assistant Professor of Mathematics at the University 
of Washington, Seattle. 

Mr. Harold C. Mathisen, Jr., of the Kaiser-Frazer Corporation has been 
transferred from Willow Run, Michigan where he was an Assistant to the Director 
of Sales, to Buffalo, New York, as Regional Credit-Distribution Supervisor. 

Mr. Jack Moshman has resigned from the U. 8. Atomic Energy Commission 
at Oak Ridge, Tennessee, to accept a position as Statistician with the Mathe- 
matics Panel of the Oak Ridge National Laboratory. 

Dr. D. N. Nanda is now acting as Senior Scientific Officer in statistics at the 
Technical Development Estt. Laboratory at Kanpur, India. 

Mr. Shanti A. Vora was awarded at the commencement June 5, 1950, the 
degree of Doctor of Philosophy in Mathematical Statistics from the University 
of North Carolina, Chapel Hill. His dissertation, entitled ‘““Bounds on the Dis- 
tribution of Chi-Square,”’ won the William Chambers Coker Award in Science 
for 1950 granted by the Elisha Mitchell Scientific Society for excellence in re- 
search in all the scientific departments of the university. He has been appointed 
Acting Assistant Professor in the Department of Statistics at Stanford Uni- 
versity, California, effective July 1, 1950, where he will be principally employed 
in research on sampling inspection. 

Professor Abraham Wald, Chairman, Department of Mathematical Statistics, 
Columbia University, gave a series of lectures on the theory of statistical decision 
functions at the Naval Ordnance Test Station, Inyokern, California, April 3-7, 
1950. Representatives from several organizations and educational institutions 
on the Pacific coast attended the lectures. 


en 


A copy of the bulletin of the Graduate School of Public Health, University 
of Pittsburgh, has been received at the Secretary’s office. The program of the 
Department of Biostatistics will be of particular interest to readers of the Annals. 
The teaching and research activities of the Department of Biostatistics are aimed 
primarily at the development of methods for the statistical appraisal of the 
health problems of groups: the community, the family, and the special aggregates 
such as the population in industry and in school. 
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The Educational Testing Service is offering for 1951-52 its fourth series of 
research fellowships in psychometrics leading to the Ph.D. degree at Princeton 
University. Open to men who are acceptable to the Graduate School of the 
University, the two fellowships each carry a stipend of $2,375 a year and are 
normally renewable. Fellows will be engaged in part-time research in the general 
area of psychological measurement at the offices of the Educational Testing 
Service and will, in addition, carry a normal program of studies in the Graduate 
School. Competence in mathematics and psychology is a prerequisite for obtain- 
ing these fellowships. Information and application blanks may be obtained from: 
Director of Psychometric Fellowship Program, Educational Testing Service, 20 
Nassau Street, Princeton, New Jersey. 

ee 
Preliminary Actuarial Examinations 
Prize Awards 


The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1950 Prelimi- 
nary Actuarial Examinations are as follows: 

First Prize of $200 

Mattuck, Arthur P.. 

Additional Prizes of $100 

Dempster, ATGnur Po... ee ce ce sees ... University of Toronto 
RMN INN EITOINY oes 55, 5050 cd divas oa'e d coe awed University of Buffalo 

oo ra dw ne nin kha Wek ates . University of Minnesota 
MMPMINMCININ PANO oi ois6 Sod Wis ss rsce ehea-nieiei dnc die ajar University of Toronto 

Le SS: a rr ae ee University of Western Ontario 
(SL ee a er ee Princeton University 
HREVROGAS, WIA Poesia cscs ceo Sodan es .. College of the Holy Cross 

OR Fr orcs. cs hs ei ainioe ds be we deiees University of Toronto 


Pe Msn epee cae area et ton Swarthmore College 


The Society of Actuaries has authorized a similar set of nine prizes for the 
1951 examinations on Part 2. 
The Preliminary Actuarial Examinations consist of the following three ex- 
aminations: 
Part 1. Language Aptitude Examination. 
(Reading comprehension, meaning of words and word relationships, antonyms, and 
verbal reasoning.) 
Part 2. General Mathematics Examination. 
(Algebra, trigonometry, coordinate geometry, differential and integral calculus.) 
Part 3. Special Mathematics Examination. 
(Finite differences, probability and statistics.) 


The 1951 Preliminary Actuarial Examinations will be prepared by the Educa- 
tional Testing Service and will be administered by the Society of Actuaries at 
centers throughout the United States and Canada on May 18, 1951. The closing 
date for applications is March 15, 1951. 

Detailed information concerning the Examinations can be obtained from: 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Iliinois 
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New Members 


The following persons have been elected to membership in the Institute 


(March 1, 1950 to May 31, 1950) 


Ard, Everett E., B.S. (Kansas State Teachers College), Student, University of Michigan, 
1657 Monson Court, Willow Run, Michigan. 

Bainbridge, T. R., B.S. (Clemson College, 8. C.), Supervisor, Koda Quality Inspection 
Group, Tennessee Eastman Corporation, Kingsport, Tennessee. 

Bankier, James D., Ph.D. (Rice Institute), Associate Professor, Mathematics Department, 
McMaster University, Hamilton, Ontario, Canada. 

den Broeder, Jr., George G., B.S. (Wayne Univ.), Student, Wayne University, 459 East 
Grand Boulevard, Detroit 7, Michigan. 

Casas, Luis T., Ph.D. (Univ. of Bogota, Colombia), Professor of Statistics, Universidad de 
los Andes and Facultad de Economia Industrial y Comercial del Gimnasio Moderno, 
also Statistician, Compania Colombiana de Seguros and Companie Colombiana de 
Seguros de Vida, Apartado Nacional No. 2088, Bogota, Colombia. 

Clark, Charles R., B.S. (Univ. of Michigan), Student, University of Michigan, 1215 West 
Cross Street, Ypsilanti, Michigan. 

Dolby, James L., M.A. (Wesleyan Univ.), Mathematical Physicist, Belding-Heminway, 
Inc., 66 Grove Street, Putnam, Connecticut. 

Elfving, Gustav, Ph.D. (Helsingfors, Finland), Professor of Mathematics, University of 
Helsingfors, Finland, now visiting Professor, Mathematics Department, Cornell Uni- 
versity, Ithaca, New York. 

Embody, Daniel R., M.S. (Cornell Univ.), Staff Statistician, The Washington Water Power 
Company, P.O. Drawer 1445, Spokane 6, Washington. 

Frazier, David, Ph.D. (Stanford Univ.), Research Chemist, Chemical and Physical Re- 
search Division, The Standard Oil Company (Ohio), 2127 Cornell Road, Cleveland 6, 
Ohio. 

Graf, Herman S., B.A. (Alfred Univ.), Student, Department of Mathematical Statistics, 
University of North Carolina, 58 Winans Drive, Yonkers 2, New York. 

Greenberg, Bernard G., Ph.D. (N. C. State College), Associate Professor and Acting Head, 
Department of Biostatistics, School of Public Health; Associate Professor, Institute 
of Statistics, Raleigh, North Carolina. 

Grenander, Ulf, Ph.D. (Stockholm Univ.), Department of Mathematical Statistics, Norr- 
tullsgatan 16, Stockholm, Sweden. 

Grosh, Jr., Louis E., M.S. (Purdue Univ.), Research Assistant, Mathematics Department, 
Purdue University, W. Lafayette, Indiana. 

Hoffman, Walter, M.A. (Wayne Univ.), Statistician, Research Laboratory, Childrens Fund 
of Michigan, 2903 Elmhurst, Detroit 6, Michigan. 

Hoffman, Robert G., A.B. (Stanford Univ.), Student, University of Michigan, 420 Thompson 
Street, Ann Arbor, Michigan. 

Hopkins, George D, B.S. (Ohio State), Statistician, Sylvania Electric Products, Inc., 
Ottawa, Ohio, 404 N. Jameson Avenue, Lima, Ohio. 

Horowitz, Jacob, B.S. (Columbia Univ.), Graduate Student, Department of Mathematical 
Statistics, Columbia University, 552 Riverside Drive, New York 27, New York. 

Huntsberger, David V., M.S. (West Va. Univ.), Graduate Student, Iowa State College, 
Ames, Iowa, 224 Pammel Court, Ames, Iowa. 

Kempff-Mercado, Rolando, Lic. en Ciencias Econ. (Univ. Mayor de San Andres), Secretary 
General of Yacimientos Petroliferos Fiscales Bolivianos (Bolivian Oil Field Authority), 
P.O. Box 1283, La Paz, Bolivia. 

Kennedy, Muriel E., B.Sc. (University of Alberta), Statistician, Special Surveys Divi- 
sion, Dominion Bureau of Statistics, 128 Mason Terrace, Ottawa, Ontario, Canada. 
Lander, Elmer L., B.A. (Western Reserve Univ., Cleveland), Student, University of Michi- 

gan, 13312 Chapelside Avenue, Cleveland 20, Ohio. 
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Li-Min, Tang, M.S. (Univ. of Mich.), Student, University of Michigan, 1109 Willard, Ann 
Arbor, Michigan. 

Lin, Shao-kung, M.A. (Louisiana State Univ.), Student, Department of Economics, Uni- 
versity of Illinois, 1202} W. University Avenue, Urbana, Illinois. 

Mandelson, Joseph, B.S. (College of City of New York), Mathematical Statistician, Chief, 
Quality Assurance Branch, Inspection Division, Office of Chief, Army Chemical Cen- 
ter, Maryland, 80 Cedar Street, Edgewood Heights, Maryland. 

Marthens, Arthur S., B.S. (Carnegie Inst. of Tech.), Mathematical Statistician, Bureau of 
Ships, Navy Department, Washington, D. C., 1820 Montier Street, Wilkinsburg, Penn- 
sylvania. 

Masel, Marvin, A.M. (Columbia Univ.), Engineering Statistician, Goodyear Aircraft Corp., 
Akron, Ohio, Y.M.C.A., Room 809, 80 Center Street, Akron, Ohio. 

McCune, Duncan C., B.A. (College of Wooster), Graduate Assistant, Mathematics Depart- 
ment, Purdue University, Lafayette, Indiana. 

Meyer, Paul L., B.S. (Univ. of Washington), Research Fellow, Laboratory of Mathe- 
matical Statistics, University of Washington, 9200-5th, N.E., Seattle 5, Washington. 

Milberg, Stanley, M.A. (Columbia Univ.), Statistician, 220 Turrell Avenue, So. Orange, 
New Jersey. 

Miser, Hugh J., Ph.D. (Ohio State Univ.), Operations Analyst Headquarters, United States 
Air Force, 2713 Blaine Drive, Chevy Chase 15, Maryland. 

Morrison, Milton, M.A. (Columbia Univ.), Instructor of Mathematics, Stevens Institute of 
Technology, Hoboken, New Jersey. 

Mulholland, Hugh P., Ph.D. (Cambridge Univ., England), Associate Professor of Mathe- 
matics, American University of Beirut, Beirut, Lebanon. 

Nelson, A. Carl, M.S. (Univ. of Delaware), Instructor in Mathematics, University of 
Delaware, Newark, Marshallton, Delaware. 

Neuwirth, Sidney I., B.A. (N.Y. Univ.), Statistician, Biological Research Laboratories, 
Schering Corporation, 86 Orange Street, Bloomfield, New Jersey. 

Pierce, James A., B.A. (Westminster College, Fulton, Mo.), Graduate Assistant, Purdue 
University, 205 Sylvia, West Lafayette, Indiana. 

Powell, Claude J., B.S. (Univ. of Tennessee), Quality Control Engineer, North American 

Rayon Corp., 615 Hattie Avenue, Elizabethton, Tennessee. 

Rojas, Basilio A., B.S. (National College of Agriculture, Mexico), Graduate Student, Iowa 
State College, Statistical Laboratory, Ames, Iowa. 

Roseboom, John H., M.S. (Dartmouth College), Instructor, Department of Economics, 
Indiana University, Bloomington, Indiana. 

Sandlin, William T., A.B. (Marshall College, Huntington, W.Va.), Independent Sales 
Engineer, 20 Fairfax Drive, Huntington, West Virginia. 

Schmitt, Samuel A., B.S. (Univ. of Chicago), Research Analyst, Department of Defense, 
Washington, D. C., 1410 N. Rhodes Street, Arlington, Virginia. 

Shaw, Richard H., M.S. (Purdue Univ.), Research Fellow, Purdue University, F.P.H.A. 
513-1 Airport Road, West Lafayette, Indiana. 

Smith, Hugh F., M.S.A. (Cornell Univ.), Professor of Experimental Statistics, Institute of 
Statistics, University of Norfh Carolina, Box 5457, College Station, Raleigh, North 
Carolina. 

Sommers, Lysle D., B.S. (Bowling Green State Univ.), Sampling Assistant, Survey Re- 
search Center, University of Michigan and Graduate Student, 1532 Leeds Court, Willow 
Run, Michigan. 

Springer, Clifford H., M.Sc. (Purdue Univ.), Instructor, Department of Mathematics, Re- 
search Assistant, Statistical Laboratory, Recitation Building, Purdue University, 
West Lafayette, Indiana. 

Stearman, Robert L., M.S. (Oregon State College), Teaching Fellow, Department of Mathe- 
maties, Oregon State College, Corvallis, Oregon. 
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Tingey, Fred H., M.S. (Univ. of Washington), Research Associate, University of Washing- 
ton, 3310 Goldendale Place, Seatile 5, Washington. 

Tipton, Lamar B., M.A. (Columbia Univ.), Statistical Clerk, Standard Oil Co. of New Jer- 
sey, 6 W. 604th Street, Shanks Village, Orangeburg, New York. 

Topp, Chester W., M.A. (Univ. of Illinois), Associate Professor of Mathematics, Fenn 
College, 1524 Comton Road, Cleveland Heights 18, Ohio. 

Vora, Shanti A., M.Sc. (Bombay), Student, Department of Mathematical Statisties, Uni- 

versity of North Carolina, 210 A, Phillips Hall, Chapel Hill, North Carolina. 

Willis, Myron J., A.M. (Indiana Univ.), Instructor of Mathematics, Purdue University, 

Statistical Laboratory, Lafayette, Indiana. 
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REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 


The forty-third meeting and the first regional Mid-western meeting of the 
Institute of Mathematical Statistics was held on the campus of the University 
of Chicago, Chicago, Illinois on Friday and Saturday, April 28 and 29, 1950. 
The morning session on April 29 was held jointly with the American Mathe- 
matical Society. The following forty-six members of the Institute were registered 
as present: 


Ix. J. Arnold, Max Astrachan, Reinhold Baer, Alvin G. Brooks, I. W. Burr, P. G. Carl- 
son, Herman Chernoff, P. S. Dwyer, H. P. Evans, J. S. Frame, Mary Goins, R. D. Gordon, 
John Gurland, P. R. Halmos, P. C. Hammer, W. L. Hart, M. A. Hatke, P. E. Irick, Howard 
L. Jones, Leo Katz, J. P. Kelly, W. M. Kineaid, L. A. Knowler, Tjalling Koopmans, F. C. 
Leone, F. W. Lott, W. G. Madow, A. M. Mark, John W. Mauchly, Kenneth May, Duncan 
C. McCune, Cyril G. Peckham, G. B. Price, P. R. Rider, Norman Rudy, L. J. Savage, G. R. 
Seth, Richard H. Shaw, Jack Sherman, M. D. Springer, Robert G. D. Steel, Z. Szatrowski, 
J. V. Talacko, R. M. Thrall, L. M. Weiner, M. E. Wescott. 


Professor Lloyd A. Knowler of the University of Iowa presided at the Friday 
afternoon session. The program consisted of the following invited papers: 


1. Why and Where Should Courses in Statistics Be Offered to Engineering Students? M. E. 
Wescott, Northwestern University. 

2. What and How Statistics Should be Taught to Engineering Students. 1. W. Burr, Pur- 
due University. 


Following this session a tea was given by the Department of Mathematics of 
the University of Chicago. 

Professor John Gurland of the University of Chicago presided at the Saturday 
morning session. This session was held jointly with the American Mathematical 
Society. The program was as follows: 


1. The Distribution of the Quotient of Ranges in Samples From a Rectangular Population. 
Paul R. Rider, Washington University, St. Louis, Missouri. 

2. A Geometric Method for Finding the Distribution of Standard Deviations when the Sam- 
pled Population Is Arbitrary. (Preliminary report). Paul Irick, Purdue University. 

3. Probability of a Correct Result with a Certain Rounding-off Procedure. W. 8. Loud, 
University of Minnesota. 

4. Analysis of a One-person Game. (Preliminary report). W. M. Kineaid, University of 
Michigan. 
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Professor W. G. Madow of the University of Illinois presided at the Saturday 


afternoon session. The program consisted of the following invited papers: 
i. 


Correlation and Regression with Matrix Factorization. P. S. Dwyer, University of 
Michigan. 


2. 


The Identification of Structural Characteristics. Tjalling Koopmans, University of 
Chicago, and Olav Reiersgl, University of Oslo, Norway. 


K. J. ARNOLD, 
Associate Secretary 
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